Preflight checks for Airflow

Before running your DAGs in Airflow, Amazon MWAA, Astronomer, or Google Cloud Composer, you can run a preflight check DAG in your Airflow instance to perform the necessary technical validations.

The preflight check DAG:

  • Neither collects nor transmits any sensitive data during the validation process, ensuring the security of your integration.
  • In case of any errors, it will provide detailed feedback for troubleshooting, including error codes and next steps.
  • Includes a retry mechanism for the API call to handle temporary network issues or server unavailability.

Preflight checks

The preflight check DAG performs the following steps to validate your Atlan and OpenLineage setup:

  1. Collects environment variables β€” to verify OpenLineage-related environment variables set during the configuration of your Airflow, Amazon MWAA, Astronomer, or Google Cloud Composer instance. These variables can vary depending on your Airflow version.
  2. Validates the OpenLineage library installation β€” to check whether the openlineage-python library has been installed and identify its version. This ensures that the necessary library for sending OpenLineage events to Atlan is in place.
  3. Sends API call for validation β€” with the information collected in the previous steps, the preflight check DAG makes a POST request to Atlan's preflight check endpoint. This is to confirm that there are no network issues or configuration errors obstructing the communication.

For example, the payload sent for validation looks like this:

{
    "connector_type": "airflow-mwaa",
    "version": "2.5.0", // Airflow version
    "ol_namespace": "staging-mwaa", // Environment variable
    "ol_endpoint": "https://<host>/events/openlineage/airflow-mwaa/", // Environment variable
    "ol_version": "1.8" // Installed OpenLineage library version
}

Note that Atlan will conduct some additional validations on the Atlan server using the provided data. This is to ensure that the integration is successful. If successful, the DAG will succeed.

Check Airflow DAG logs

To check Airflow DAG logs:

  1. Open your Airflow homepage.
  2. From the homepage, navigate to the AtlanOpenLineageConnectionTestV1 DAG. Under the Runs column, click the failed run, circled in red.
  3. On the List Dag Run page, under the Run Id column, click the latest or top most failed run ID.
  4. From the corresponding screen, click the run_ol_preflight_check task.
  5. From the tabs along the top of the Task Instance popup, click Log to view logs.
  6. On the Log page, scroll down to the Exception section to view the error code and message. Refer to the troubleshooting guide below to make the necessary changes.

For other distributions, refer to Amazon MWAA, Astronomer, or Google Cloud Composer documentation for more details.

Troubleshoot errors

Missing environment variables

Environment variable {var} is missing. Please set it before running the DAG.

Ensure that the required environment variables are set in your environment. These variables can vary depending on your Airflow version:

  • For Airflow versions 2.7.0 onward: AIRFLOW__OPENLINEAGE__NAMESPACE and AIRFLOW__OPENLINEAGE__TRANSPORT.
  • For Airflow versions 2.5.0 onward and prior to 2.7.0: OPENLINEAGE_URL, OPENLINEAGE_API_KEY, and OPENLINEAGE_NAMESPACE.

Invalid transport configuration

🚨 Careful! For all errors related to the environment variable configuration for AIRFLOW__OPENLINEAGE__TRANSPORT, Atlan recommends paying attention to the JSON structure and adhering to the expected schema. You will need to ensure that the string values are correctly formatted, and the dictionaries contain the correct types and necessary keys.

The following errors are related to the AIRFLOW__OPENLINEAGE__TRANSPORT configuration for Airflow versions 2.7.0 onward:

Missing keys in 'transport_info'

"'{key}' is missing, update variable - AIRFLOW__OPENLINEAGE__TRANSPORT."
  • Ensure that the specified key is present in the AIRFLOW__OPENLINEAGE__TRANSPORT JSON structure.

Incorrect type for keys in 'transport_info'

"'{key}' must be of type {expected_type.name}, update variable - AIRFLOW__OPENLINEAGE__TRANSPORT."
  • Update the type of specified key in the AIRFLOW__OPENLINEAGE__TRANSPORT JSON structure to match the expected type.

White space in impermissible field

"'{key}' cannot contain whitespace, update variable - AIRFLOW__OPENLINEAGE__TRANSPORT."
  • Remove any white space from the specified key value in the AIRFLOW__OPENLINEAGE__TRANSPORT JSON structure.

Empty or white space in fields

"'{key}' cannot be empty or contain whitespaces. update variable - AIRFLOW__OPENLINEAGE__TRANSPORT."
  • Ensure that the specified key in the AIRFLOW__OPENLINEAGE__TRANSPORT JSON structure is neither empty nor does it contain any white space.

Network permission error

ERROR - Failed to emit OpenLineage event
HTTPSConnectionPool(host='<instance>.atlan.com', port=443):
Max retries exceeded with url: /events/openlineage/airflow-cloud-composer/api/v1/lineage
  • This error may result from the firewalls or VPC on which your Airflow instance is hosted. Contact your network team to update the network permissions. This will allow your Airflow instance to make API calls to the URL mentioned in the error message.

Unsupported Airflow version

{
 "status":"fail",
 "error_code":"unsupported_airflow_version",
 "error_message":"Minimum supported version is 2.5.0, you are using 2.4"
}
  • Atlan does not support integrating with Airflow versions older than 2.5.0. Upgrade your Airflow version to 2.5.0 or above. This will allow OpenLineage to push metadata to Atlan.

Connection not found

{
 "status":"fail",
 "error_code":"connection_not_found",
 "error_message":"<connection_name> is not present on Atlan."
}
  • The connection name set in your environment variables does not match the connection name created in Atlan. Create a new connection or use an existing connection name.

Invalid OpenLineage endpoint

{
 "status":"fail",
 "error_code":"invalid_openlineage_endpoint",
 "error_message":"Your OL endpoint should be: https://<instance>.atlan.com/events/openlineage/airflow/ || received: https://<instance>.atlan.com/events/openlineage/airflow/haha please update this."
}
  • Update the environment variable containing the OpenLineage URL to the expected URL in the error message.

Unsupported OpenLineage version

{
 "status":"fail",
 "error_code":"unsupported_ol_version",
 "error_message":"Minimum supported version is 1.2.0, you are using 1.1"
}
  • Install the latest version of the openlineage-airflow library β€” 1.8.0 and above. The preflight check DAG will fail if you're using any OpenLineage version older than 1.2.0.

Connection fetching failed

{
 "status":"fail",
 "error_code":"connection_fetching_failed",
 "error_message":"Atlas Error: Failed to fetch connections - Post \"http://localhost:21000/api/atlas/v2/search/indexsearch\\": dial tcp [::1]:21000: connect: connection refused
}
πŸ’ͺ Did you know? If you continue to encounter any issues, Atlan recommends enabling debug logging for OpenLineage using the OPENLINEAGE_CLIENT_LOGGING=DEBUG environment variable. Run your DAGs again and then share the debug log with Atlan support for troubleshooting.

Related articles

Was this article helpful?
0 out of 0 found this helpful