What does Atlan crawl from Apache Airflow/OpenLineage?

Once you have integrated Apache Airflow/OpenLineage, you can use connector-specific filters for quick asset discovery. The following filters are currently supported:

  • Status filter β€” last run status for an asset
  • Duration filter β€” last run duration for an asset

Atlan maps the following assets and properties from Apache Airflow/OpenLineage. Asset lineage support depends on the list of operators supported by OpenLineage.

DAGs

Atlan maps DAGs (directed acyclic graphs) from Apache Airflow/OpenLineage to its AirflowDAG asset type.

Source property Atlan property Description
job.name name Name of the Airflow DAG
- qualifiedName Unique identifier for the DAG in Atlan
description description Description of the DAG from Airflow
owner sourceOwners Original owner information from Airflow
- ownerUsers Validated Atlan usernames (mapped from source owners)
schedule_interval airflowDagSchedule DAG's schedule interval (cron expression or preset)
delta airflowDagScheduleDelta Schedule interval in seconds
tags airflowTags Tags assigned to the DAG
run_id airflowRunName Unique identifier for the DAG run
run_type airflowRunType Type of run (scheduled, manual, backfill)
eventTime (start) airflowRunStartTime Timestamp when the DAG run started
eventTime (end) airflowRunEndTime Timestamp when the DAG run completed
eventType airflowRunOpenLineageState Final status of the DAG run
version airflowRunVersion Airflow version
openlineageAdapterVersion airflowRunOpenLineageVersion OpenLineage adapter version
- sourceURL Direct link to the DAG in Airflow UI
- connectionName Name of the connector instance
- connectionQualifiedName Unique identifier for the connector instance
- connectorName Name of the connector type
πŸ’ͺ Did you know? If a DAG has more than 10 valid owner email addresses (comma-separated), only the first 10 will be captured and published.

Tasks

Atlan maps tasks from Apache Airflow/OpenLineage to its AirflowTask asset type.

Source property Atlan property Description
job.name (partial) name Name of the task (extracted from full job name)
- qualifiedName Unique identifier for the task in Atlan
- airflowDagName Name of the parent DAG
- airflowDagQualifiedName Unique identifier for the parent DAG in Atlan
operator_class airflowTaskOperatorClass Type of operator used for the task
conn_id airflowTaskConnectionId Connection ID used by the task
sql airflowTaskSql SQL query (for SQL-based operators)
owner sourceOwners Owner information from the task definition
eventTime (start) airflowRunStartTime Timestamp when the task started
eventTime (end) airflowRunEndTime Timestamp when the task completed
eventType airflowRunOpenLineageState Final status of the task run
run_id airflowRunName Unique identifier for the task run
run_type airflowRunType Type of run (from parent DAG)
pool airflowTaskPool Worker pool assigned to the task
pool_slots airflowTaskPoolSlots Number of pool slots used by the task
priority_weight airflowTaskPriorityWeight Priority weight for execution order
queue airflowTaskQueue Worker queue assigned to the task
retries airflowTaskRetryNumber Number of retry attempts configured
trigger_rule airflowTaskTriggerRule Rule that determines when task should run
group_id airflowTaskGroupName Task group the task belongs to
version airflowRunVersion Airflow version
openlineageAdapterVersion airflowRunOpenLineageVersion OpenLineage adapter version
- sourceURL Direct link to the task run in Airflow UI

Related articles

Was this article helpful?
1 out of 1 found this helpful