How to integrate Airflow/OpenLineage

To integrate Airflow/OpenLineage with Atlan, complete the following steps.

πŸ’ͺ Did you know? For Airflow operators supported for OpenLineage extraction, you can refer to Airflow's Supported operators documentation. To learn how to extract lineage though OpenLineage methods, custom extractors, or manually annotated lineage, see Implementing OpenLineage in Operators.

Create an API token in Atlan

Before running the workflow, you will need to create an API token in Atlan.

Configure the integration in Atlan

Select the source

To select Airflow/OpenLineage as your source, from within Atlan:

  1. In the top right of any screen, click New and then click New workflow.
  2. From the filters along the top, click Orchestrator.
  3. From the list of packages, select Airflow Assets and then click Setup Workflow.

Create the connection

To configure the Airflow/OpenLineage connection, from within Atlan:

  1. For Connection Name, provide a connection name that represents your source environment. For example, you might use values like production,development,gold, or analytics.
  2. (Optional) To change the users who are able to manage this connection, change the users or groups listed under Connection Admins.
    🚨 Careful! If you do not specify any user or group, no one will be able to manage the connection β€” not even admins.
  3. (Optional) For Host and Port, enter the URL and port number of your Airflow UI, respectively. This will allow Atlan to help you view your assets directly in Airflow from the asset profile.
  4. For Enable OpenLineage Events, click Yes to enable the processing of OpenLineage events or click No to disable it. If disabled, new events will not be processed in Atlan.
  5. To create a connection, at the bottom of the screen, click the Create connection button.

Configure the integration in Airflow/OpenLineage

πŸ’ͺ Did you know? You will need the Atlan API token and connection name to configure the integration in Airflow/OpenLineage. This will allow Airflow to connect with the OpenLineage API and send events to Atlan.
🚨 Careful! Atlan does not support integrating with Airflow versions older than 2.5.0.

To configure Airflow to send OpenLineage events to Atlan:

  1. Based on your Airflow version, there may be additional prerequisites for using OpenLineage:
    • For Airflow versions 2.7.0 onward, download and install the latest apache-airflow-providers-openlineage package and update the requirements.txt file of your Airflow instance with:
      apache-airflow-providers-openlineage
    • For Airflow versions 2.5.0 onward, download and install the latest openlineage-airflow library and update the requirements.txt file of your Airflow instance with:
      openlineage-airflow
  2. Add the following environment variables to your project's .env file:
    • For Airflow versions 2.7.0 onward:
      • AIRFLOW__OPENLINEAGE__NAMESPACE: set the connection name as exactly configured in Atlan.
      • AIRFLOW__OPENLINEAGE__TRANSPORT: specify details of where and how to send OpenLineage events in the following JSON string format:
        {
          "type": "http", 
          "url": "https://<instance>.atlan.com/events/openlineage/airflow/", 
          "auth": { 
            "type": "api_key", 
            "api_key": "<API_token>"
           }
        }
        • Replace <instance> with the name of your Atlan instance.
        • Replace <API_token> with the API token generated in Atlan.
    • For Airflow versions 2.5.0 onward and prior to 2.7.0: 
      • OPENLINEAGE_URL: points to the service that will consume OpenLineage events β€” for example, https://<instance>.atlan.com/events/openlineage/airflow/.
      • OPENLINEAGE_API_KEY: set the API token generated in Atlan.
      • OPENLINEAGE_NAMESPACE: set the connection name as exactly configured in Atlan.

Verify the Atlan connection in Airflow 

To verify connectivity to Airflow:

  1. For Verify connection with Airflow, click the clipboard icon to copy and run the preflight check DAG on your Airflow instance to test connectivity with Atlan. If you encounter any errors after running the DAG, refer to the preflight checks documentation.
  2. Click Done to complete setup.

Once your DAGs have completed running in Airflow, you will see Airflow DAGs and tasks along with lineage from OpenLineage events in Atlan! πŸŽ‰

πŸ’ͺ Did you know? Atlan also supports other Airflow distributions to enhance your data management and workflow capabilities, including Amazon MWAA, Astronomer, and Google Cloud Composer.

Related articles

Was this article helpful?
0 out of 0 found this helpful