What is lineage?
Data lineage captures how data moves across your data landscape. This information is useful to:
- Trace data's origins, to assist with root cause analysis
- Trace data's destinations, to assist with impact analysis
- Automate the propagation of metadata to derived assets
Root cause analysis
Root cause analysis is about identifying the underlying causes of a data problem. You want to know where the data came from and what happened to it before it got to you. With root cause analysis, your focus is on these upstream sources and transformations.
Impact analysis
Impact analysis is about identifying potential consequences of changes. You want to know where the data is going and what could happen to others if you change it. With impact analysis, the primary focus is on these downstream systems and consumers.
How does it work?
Atlan constructs lineage by combining assets and processes:
- Assets represent the inputs and outputs of processes — databases, dashboards, and so on.
- Processes represent the activities that move or transform data between the assets. (Processes are the lines between the assets in Atlan's graphical view.)
Atlan chains these together into a flow of data from various resources:
SQL parsing
Atlan parses SQL queries to determine how data stores have created or transformed assets. Examples of this include:
API crawling
Atlan also retrieves lineage information for assets from APIs. Examples of this include:
API ingestion
Atlan provides built-in lineage extraction for the tools above. But you can also extend lineage with your own information using Atlan's open APIs. You can use these to integrate lineage from your own home-grown tools or orchestration suites like Apache Airflow and Dagster.