Before running your DAGs in Apache Airflow, Amazon MWAA, Astronomer, or Google Cloud Composer, you can run a preflight check DAG in your Apache Airflow instance to perform the necessary technical validations.
The preflight check DAG:
- Neither collects nor transmits any sensitive data during the validation process, ensuring the security of your integration.
- In case of any errors, it will provide detailed feedback for troubleshooting, including error codes and next steps.
- Includes a retry mechanism for the API call to handle temporary network issues or server unavailability.
Preflight checks
The preflight check DAG performs the following steps to validate your Atlan and OpenLineage setup:
- Collects environment variables — to verify OpenLineage-related environment variables set during the configuration of your Apache Airflow, Amazon MWAA, Astronomer, or Google Cloud Composer instance. These variables can vary depending on your Apache Airflow version.
-
Validates the OpenLineage library installation — to check whether the
openlineage-python
library has been installed and identify its version. This ensures that the necessary library for sending OpenLineage events to Atlan is in place. -
Sends API call for validation — with the information collected in the previous steps, the preflight check DAG makes a
POST
request to Atlan's preflight check endpoint. This is to confirm that there are no network issues or configuration errors obstructing the communication.
For example, the payload sent for validation looks like this:
{
"connector_type": "airflow-mwaa",
"version": "2.5.0", // Airflow version
"ol_namespace": "staging-mwaa", // Environment variable
"ol_endpoint": "https://<host>/events/openlineage/airflow-mwaa/", // Environment variable
"ol_version": "1.8" // Installed OpenLineage library version
}
Note that Atlan will conduct some additional validations on the Atlan server using the provided data. This is to ensure that the integration is successful. If successful, the DAG will succeed.
Check Apache Airflow DAG logs
To check Apache Airflow DAG logs:
- Open your Apache Airflow homepage.
- From the homepage, navigate to the AtlanOpenLineageConnectionTest DAG. Under the Runs column, click the failed run, circled in red.
- On the List Dag Run page, under the Run Id column, click the latest or top most failed run ID.
- From the corresponding screen, click the run_ol_preflight_check task.
- From the tabs along the top of the Task Instance popup, click Log to view logs.
- On the Log page, scroll down to the Exception section to view the error code and message. Refer to the troubleshooting guide below to make the necessary changes.
For other distributions, refer to Amazon MWAA, Astronomer, or Google Cloud Composer documentation for more details.
Troubleshoot errors
Missing environment variables
Environment variable {var} is missing. Please set it before running the DAG.
Ensure that the required environment variables are set in your environment. These variables can vary depending on your Apache Airflow version:
- For Apache Airflow versions 2.7.0 onward:
AIRFLOW__OPENLINEAGE__NAMESPACE
andAIRFLOW__OPENLINEAGE__TRANSPORT
. - For Apache Airflow versions 2.5.0 onward and prior to 2.7.0:
OPENLINEAGE_URL
,OPENLINEAGE_API_KEY
, andOPENLINEAGE_NAMESPACE
.
Invalid transport configuration
AIRFLOW__OPENLINEAGE__TRANSPORT
, Atlan recommends paying attention to the JSON structure and adhering to the expected schema. You will need to ensure that the string values are correctly formatted, and the dictionaries contain the correct types and necessary keys.The following errors are related to the AIRFLOW__OPENLINEAGE__TRANSPORT
configuration for Apache Airflow versions 2.7.0 onward:
Missing keys in 'transport_info'
"'{key}' is missing, update variable - AIRFLOW__OPENLINEAGE__TRANSPORT."
- Ensure that the specified key is present in the
AIRFLOW__OPENLINEAGE__TRANSPORT
JSON structure.
Incorrect type for keys in 'transport_info'
"'{key}' must be of type {expected_type.name}, update variable - AIRFLOW__OPENLINEAGE__TRANSPORT."
- Update the type of specified key in the
AIRFLOW__OPENLINEAGE__TRANSPORT
JSON structure to match the expected type.
White space in impermissible field
"'{key}' cannot contain whitespace, update variable - AIRFLOW__OPENLINEAGE__TRANSPORT."
- Remove any white space from the specified key value in the
AIRFLOW__OPENLINEAGE__TRANSPORT
JSON structure.
Empty or white space in fields
"'{key}' cannot be empty or contain whitespaces. update variable - AIRFLOW__OPENLINEAGE__TRANSPORT."
- Ensure that the specified key in the
AIRFLOW__OPENLINEAGE__TRANSPORT
JSON structure is neither empty nor does it contain any white space.
Network permission error
ERROR - Failed to emit OpenLineage event
HTTPSConnectionPool(host='<instance>.atlan.com', port=443):
Max retries exceeded with url: /events/openlineage/airflow-cloud-composer/api/v1/lineage
- This error may result from the firewalls or VPC on which your Apache Airflow instance is hosted. Contact your network team to update the network permissions. This will allow your Airflow instance to make API calls to the URL mentioned in the error message.
Unsupported Apache Airflow version
{
"status":"fail",
"error_code":"unsupported_airflow_version",
"error_message":"Minimum supported version is 2.5.0, you are using 2.4"
}
- Atlan does not support integrating with Apache Airflow versions older than 2.5.0. Upgrade your Apache Airflow version to 2.5.0 or above. This will allow OpenLineage to push metadata to Atlan.
Connection not found
{
"status":"fail",
"error_code":"connection_not_found",
"error_message":"<connection_name> is not present on Atlan."
}
- The connection name set in your environment variables does not match the connection name created in Atlan. Create a new connection or use an existing connection name.
Invalid OpenLineage endpoint
{
"status":"fail",
"error_code":"invalid_openlineage_endpoint",
"error_message":"Your OL endpoint should be: https://<instance>.atlan.com/events/openlineage/airflow/ || received: https://<instance>.atlan.com/events/openlineage/airflow/haha please update this."
}
- Update the environment variable containing the OpenLineage URL to the expected URL in the error message.
Unsupported OpenLineage version
{
"status":"fail",
"error_code":"unsupported_ol_version",
"error_message":"Minimum supported version is 1.2.0, you are using 1.1"
}
- Install the latest version of the
openlineage-airflow
library — 1.8.0 and above. The preflight check DAG will fail if you're using any OpenLineage version older than 1.2.0.
Connection fetching failed
{
"status":"fail",
"error_code":"connection_fetching_failed",
"error_message":"Atlas Error: Failed to fetch connections - Post \"http://localhost:21000/api/atlas/v2/search/indexsearch\\": dial tcp [::1]:21000: connect: connection refused
}
- Contact Atlan support to help you debug this error.
OPENLINEAGE_CLIENT_LOGGING=DEBUG
environment variable. Run your DAGs again and then share the debug log with Atlan support for troubleshooting.