So you've crawled your source, and mined the queries, but lineage is missing. Why?
Where to look first?
Views
- Check the SQL attribute of the view data asset β this must have SQL in it for view lineage to appear.
- The crawler workflows populate the SQL attribute. If it's empty on the view asset, the crawler is the suspect.
Tables
- The miner workflows populate table lineage. If it's missing, the miner is the suspect.
- Check the SQL picked up by the miner (for example, in S3).
- If the miner picks up the necessary SQL but lineage is not produced, check if any of the assets involved are missing.
Data stores to BI assets
- For Atlan to link these assets, the upstream assets (data stores) must first exist.
- If they are only created after the downstream assets, lineage will stay unlinked.
- Or if some of the assets are missing, lineage may have gaps preventing linkage.
Show more menu
- Lineage may appear missing if the linked asset is hidden in the Show more menu. Although it will still appear in the list of upstream or downstream assets in the Lineage tab in the side profile, it will not appear visually in the lineage graph. Click Show more columns to see the rest of the assets and their lineage.
Miner logic
When setting up the miner for the first time, you will need to provide a start date β ranging from the last two days up to past two weeks of query history. If an asset has not been queried during the selected time period, data lineage will be unavailable.
For subsequent runs, the miner will fetch query history based on the following logic:
START_TIME
<= CURRENT_DATE
- INTERVAL '1 DAY'
For example, the miner logic for January 23 will be:
- Jan 22 5 p.m. <= Jan 23 00:00 - 1 day
- Jan 22 5 p.m. <= Jan 22 00:00
The miner will not fetch the data for the previous day (January 22) on the current day (January 23). Atlan requires a minimum lag of 24 to 48 hours to capture all the relevant transformations that were part of a session.
Causes of missing lineage
There are several reasons why lineage may be missing:
Workflow ordering
The order of operations you run in Atlan is important. To have lineage across tools, you need to:
- Crawl data stores first.
- Mine query logs (and dbt) second.
- Crawl BI tools last.
If you've used a different order, the upstream assets (data stores) may not yet exist when you load the BI metadata. Then you can have lineage within the BI metadata, but not between the BI metadata and the data sources.
If that's the case for you, don't worry. Re-run your existing workflows in the order above and Atlan should resolve it.
Crawling filters
Another reason lineage may have gaps is that linking assets do not exist, even after re-running the crawler.
When crawling a source, you can specify filters on which metadata to include and exclude. If you've excluded metadata needed to link assets into lineage, then end-to-end lineage will have gaps.
Check that you have not excluded any of the asset(s) you're expecting to be in lineage. (And remember that using an include filter means that not all metadata is being crawled β some is being excluded.)
If in doubt, try running your workflows without include or exclude filters.
Source permissions
Atlan is not the only place where you can filter metadata.
Atlan accesses your sources through credentials you provide. Those credentials have assigned permissions controlling what (meta)data they can access in the source. If those permissions prevent access to some (meta)data, then Atlan cannot crawl that metadata.
So if ordering and filter don't fix the problem, check your source permissions. Are they providing access to all the data assets you need for lineage?
Different connections, same source
We currently do not resolve lineage across different connections for the same source. You need to crawl (and mine) all assets from a given source through the same connection to generate lineage.
qualifiedName
of the asset matches exactly what lineage expects.Temporary tables
If your data processing tool uses temporary tables, Atlan can still support generating lineage accurately.
For example:
- Table A β temporary table β table B
- Lineage will be represented as table A β table B in Atlan.
In this case, Atlan assumes that tables A and B are present in Atlan. However, if table A is missing, then Atlan will not be able to generate lineage.
Cross-connection links
If the combination of database, schema, and table name for an asset is the same across different connections, it is possible that Atlan may create unexpected links for these assets.
For example, if your Production
environment has the same set of databases, schemas, and tables as your Staging
environment and both these source systems are crawled, Atlan may connect BI reports to either of these assets due to the name-match algorithm.
Indirect data flow
Atlan currently only processes and visualizes direct data flow on the lineage graph. However, assets can be related through other means such as control flow or conditional statements, in which case there is no data movement between them. Atlan currently neither processes nor visualizes such relationships on the lineage graph.
For example, when processing the following query:
insert into
tgt_tab (col_x, col_y)
select
col_x,
case
when col_y > 100 then 'High'
else 'Low'
end
from
src_tab
Atlan will display the following links in the lineage graph:
- src_tab β tgt_tab (table-level data flow)
- src_tab.col_x β tgt_tab.col_x (column-level data flow)
Note that there is no lineage generated for col_y. This is because the data present in src_tab.col_y does not actually flow or get transferred to tgt_tab.col_y.
Lineage persistence
Lineage in Atlan is reflective of the last valid set of transformations performed for a particular target table in the external (source) system. Atlan retains these transformations as lineage and does not auto-delete or sunset the process entities (links).
The exception to this rule is when new information pertaining to the same target table is inferred in the latest job run. In this case, Atlan will replace the previous links with new ones.