How to integrate Amazon MWAA/OpenLineage

Have more questions? Submit a request

To integrate Amazon Managed Workflows for Apache Airflow (MWAA) with Atlan, review the order of operations and then complete the following steps. (Alternatively, you can use the AWS Secrets Manager to store the environment variables and fetch them using the plugin, follow the steps here to do so.)

Create an API token in Atlan

Before running the workflow, you will need to create an API token in Atlan.

Select the source in Atlan

To select Amazon MWAA/OpenLineage as your source, from within Atlan:

  1. In the top right of any screen, click New and then click New workflow.
  2. From the filters along the top, click Orchestrator.
  3. From the list of packages, select Amazon MWAA Airflow Assets and then click Setup Workflow.

Configure the integration in Atlan

To configure the Amazon MWAA/OpenLineage connection, from within Atlan:

  1. For Host, enter the URL of your Airflow UI β€” do not include any extra paths such as /home in the URL.
  2. (Optional) For Port, enter the port number for your Airflow UI.
  3. For Enable OpenLineage Events, click Yes to enable the processing of OpenLineage events or click No to disable it. If disabled, new events will not be processed in Atlan.
  4. For Connection Name, provide a connection name that represents your source environment. For example, you might use values like production,development,gold, or analytics.
  5. (Optional) To change the users who are able to manage this connection, change the users or groups listed under Connection Admins.
    🚨 Careful! If you do not specify any user or group, no one will be able to manage the connection β€” not even admins.
  6. To run the workflow, at the bottom of the screen, click the Run button.

Configure the integration in Amazon MWAA

πŸ’ͺ Did you know? You will need the Atlan API token and connection name to configure the integration in Amazon MWAA. This will allow Amazon MWAA to connect with the OpenLineage API and send events to Atlan.

To configure Amazon MWAA to send OpenLineage events to Atlan:

  1. Based on your Airflow version on Amazon MWAA, there may be additional prerequisites for using OpenLineage:
    • For Airflow versions 2.3 onward, download and install the latest openlineage-airflow library and add the following to the requirements.txt file of your Airflow instance:
      openlineage-airflow
    • For versions older than Airflow version 2.3:
      1. Download and install the latest openlineage-airflow library and add the following to the requirements.txt file of your Airflow instance:
        openlineage-airflow
      2. Set your LineageBackend in your airflow.cfg or via environmental variable AIRFLOW__LINEAGE__BACKEND to the env_var_plugin.py file:
        openlineage.lineage_backend.OpenLineageBackend
  2. You will need to deploy a custom envvar plugin to Amazon MWAA. Create an env_var_plugin.py file and add the following Python code in the plugin:
    
        from airflow.plugins_manager import AirflowPlugin
        import os
          
        os.environ["OPENLINEAGE_URL"] = "https://.atlan.com/events/openlineage/airflow-mwaa/"
        os.environ["OPENLINEAGE_NAMESPACE"] = ""
        os.environ["OPENLINEAGE_API_KEY"] = ""
          
        class EnvVarPlugin(AirflowPlugin):
          name = "env_var_plugin"
        
    • OPENLINEAGE_URL: points to the service that will consume OpenLineage events β€” for example, https://<instance>.atlan.com/events/openlineage/airflow-mwaa/.
    • OPENLINEAGE_NAMESPACE: set the connection name as exactly configured in Atlan.
    • OPENLINEAGE_API_KEY: set the API token generated in Atlan.
  3. Amazon MWAA allows you to install a plugin through a zip archive. You can either:
    • Use the following code to zip your env_var_plugin.py file:
      zip plugins.zip env_var_plugin.py
    • If you already have a plugins.zip file, add the env_var_plugin.py file to your zip file.
  4. Upload the plugins.zip and requirements.txt files to the S3 bucket connected to your Amazon MWAA environment. Amazon MWAA requires your DAGs, plugins, and requirements.txt file to be in the same S3 bucket, which serves as the source location for your environment.
  5. You will need to specify the path for the latest versions of the plugins.zip and requirements.txt files in Amazon MWAA. To specify the path:
    1. Open the Environments page on the Amazon MWAA console.
    2. Select an environment and then click Edit.
    3. In the DAG code in Amazon S3 section, configure the following:
      1. For Plugins file - optional, select the plugins.zip file in the S3 bucket connected to your Amazon MWAA environment or choose the latest plugins.zip version from the dropdown list.
      2. For Requirements file - optional, select the latest requirements.txt file version from the dropdown list.
    4. Click Next, Update environment. or Next to save your configurations.
  6. (Optional) Test a sample data pipeline from the Airflow UI to confirm connectivity.

Once the orchestrator has completed running, you will see Airflow DAGs and tasks along with lineage from OpenLineage events in Atlan! πŸŽ‰

Related articles

Was this article helpful?
0 out of 0 found this helpful