To integrate Amazon Managed Workflows for Apache Airflow (MWAA) with Atlan, complete the following steps. (Alternatively, you can use the AWS Secrets Manager to store the environment variables and fetch them using the plugin, follow the steps here to do so.)
To learn more about OpenLineage, refer to OpenLineage configuration and facets.
Create an API token in Atlan
Before running the workflow, you will need to create an API token in Atlan.
Configure the integration in Atlan
Select the source
To select Amazon MWAA/OpenLineage as your source, from within Atlan:
- In the top right of any screen, click New and then click New workflow.
- From the filters along the top, click Orchestrator.
- From the list of packages, select Amazon MWAA Airflow Assets and then click Setup Workflow.
Create the connection
You will only need to create a connection once to enable Atlan to receive incoming OpenLineage events. Once you have set up the connection, you neither have to rerun the workflow nor schedule it. Atlan will process the OpenLineage events as and when your DAGs run to catalog your Apache Airflow assets.
To configure the Amazon MWAA/OpenLineage connection, from within Atlan:
- For Connection Name, provide a connection name that represents your source environment. For example, you might use values like
production
,development
,gold
, oranalytics
. - (Optional) To change the users who are able to manage this connection, change the users or groups listed under Connection Admins.
🚨 Careful! If you do not specify any user or group, no one will be able to manage the connection — not even admins.
- (Optional) For Host, enter the URL of your Apache Airflow UI — do not include any extra paths such as
/home
in the URL. This will allow Atlan to help you view your assets directly in Amazon MWAA from the asset profile. - (Optional) For Port, enter the port number for your Apache Airflow UI.
- For Enable OpenLineage Events, click Yes to enable the processing of OpenLineage events or click No to disable it. If disabled, new events will not be processed in Atlan.
- To create a connection, at the bottom of the screen, click the Create connection button.
Configure the integration in Amazon MWAA
To configure Amazon MWAA to send OpenLineage events to Atlan:
- Based on your Apache Airflow version on Amazon MWAA, there may be additional prerequisites for using OpenLineage:
- For Apache Airflow versions 2.7.0 onward, update the
requirements.txt
file of your Apache Airflow instance with:apache-airflow-providers-openlineage
- For Apache Airflow versions 2.5.0 onward and prior to 2.7.0, update the
requirements.txt
file of your Apache Airflow instance:openlineage-airflow
- For Apache Airflow versions 2.7.0 onward, update the
- To set environment variables, you will need to deploy a custom plugin to Amazon MWAA. Create an
env_var_plugin.py
file and add the following Python code in the plugin:- For Apache Airflow versions 2.7.0 onward:
from airflow.plugins_manager import AirflowPlugin import os os.environ["AIRFLOW__OPENLINEAGE__NAMESPACE"] = "<connection_name>" os.environ["AIRFLOW__OPENLINEAGE__TRANSPORT"] = '''{ "type": "http", "url": "https://<instance>.atlan.com/events/openlineage/airflow-mwaa/", "auth": { "type": "api_key", "api_key": "<API_token>" } }''' os.environ["AIRFLOW__OPENLINEAGE__CONFIG_PATH"] = "" os.environ["AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS"] = "" class EnvVarPlugin(AirflowPlugin): name = "env_var_plugin"
-
AIRFLOW__OPENLINEAGE__NAMESPACE
: replace<connection_name>
with the connection name as exactly configured in Atlan. -
AIRFLOW__OPENLINEAGE__TRANSPORT
: specify details of where and how to send OpenLineage events.- Replace
<instance>
with the name of your Atlan instance. - Replace
<API_token>
with the API token generated in Atlan.
- Replace
-
AIRFLOW__OPENLINEAGE__CONFIG_PATH
: specifies that theapache-airflow-providers-openlineage
package read the OpenLineage config from environment variables instead of a config file. -
AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS
: specifies that OpenLineage must send events for all operators — only required for theapache-airflow-providers-openlineage
package.
-
- For Apache Airflow versions 2.5.0 onward and prior to 2.7.0:
from airflow.plugins_manager import AirflowPlugin import os os.environ["OPENLINEAGE_URL"] = "https://<instance>.atlan.com/events/openlineage/airflow-mwaa/" os.environ["OPENLINEAGE_NAMESPACE"] = "<connection_name>" os.environ["OPENLINEAGE_API_KEY"] = "<API_token>" class EnvVarPlugin(AirflowPlugin): name = "env_var_plugin"
-
OPENLINEAGE_URL
: points to the service that will consume OpenLineage events — for example,https://<instance>.atlan.com/events/openlineage/airflow-mwaa/
. -
OPENLINEAGE_NAMESPACE
: set the connection name as exactly configured in Atlan. -
OPENLINEAGE_API_KEY
: set the API token generated in Atlan.
-
- For Apache Airflow versions 2.7.0 onward:
- Amazon MWAA allows you to install a plugin through a zip archive. You can either:
- Use the following code to zip your
env_var_plugin.py
file:zip plugins.zip env_var_plugin.py
- If you already have a
plugins.zip
file, add theenv_var_plugin.py
file to your zip file.
- Use the following code to zip your
- Upload the
plugins.zip
andrequirements.txt
files to the S3 bucket connected to your Amazon MWAA environment. Amazon MWAA requires your DAGs, plugins, andrequirements.txt
file to be in the same S3 bucket, which serves as the source location for your environment. - You will need to specify the path for the latest versions of the
plugins.zip
andrequirements.txt
files in Amazon MWAA. To specify the path:- Open the Environments page on the Amazon MWAA console.
- Select an environment and then click Edit.
- In the DAG code in Amazon S3 section, configure the following:
- For Plugins file - optional, select the
plugins.zip
file in the S3 bucket connected to your Amazon MWAA environment or choose the latestplugins.zip
version from the dropdown list. - For Requirements file - optional, select the latest
requirements.txt
file version from the dropdown list.
- For Plugins file - optional, select the
- Click Next, Update environment. or Next to save your configurations.
Verify the Atlan connection in Amazon MWAA
To verify connectivity to Amazon MWAA:
- For Verify connection with MWAA, click the clipboard icon to copy and run the preflight check DAG on your Amazon MWAA instance to test connectivity with Atlan. If you encounter any errors after running the DAG, refer to the preflight checks documentation.
- Click Done to complete setup.
Once your DAGs have completed running in Apache Airflow, you will see Apache Airflow DAGs and tasks along with lineage from OpenLineage events in Atlan! 🎉
You can also view event logs in Atlan to track and debug events received from OpenLineage.