Troubleshooting lineage

So you've crawled your source, and mined the queries, but lineage is missing. Why?

Where to look first?

Views

  • Check the SQL attribute of the view data asset β€” this must have SQL in it for view lineage to appear.
  • The crawler workflows populate the SQL attribute. If it's empty on the view asset, the crawler is the suspect.

Tables

  • The miner workflows populate table lineage. If it's missing, the miner is the suspect.
  • Check the SQL picked up by the miner (for example, in S3).
  • If the miner picks up the necessary SQL but lineage is not produced, check if any of the assets involved are missing.

Data stores to BI assets

  • For Atlan to link these assets, the upstream assets (data stores) must first exist.
  • If they are only created after the downstream assets, lineage will stay unlinked.
  • Or if some of the assets are missing, lineage may have gaps preventing linkage.

Show more menu

  • Lineage may appear missing if the linked asset is hidden in the Show more menu. Although it will still appear in the list of upstream or downstream assets in the Lineage tab in the side profile, it will not appear visually in the lineage graph. Click Show more columns to see the rest of the assets and their lineage.

Miner logic

When setting up the miner for the first time, you will need to provide a start date β€” ranging from the last two days up to past two weeks of query history. If an asset has not been queried during the selected time period, data lineage will be unavailable.

For subsequent runs, the miner will fetch query history based on the following logic:

START_TIME <= CURRENT_DATE - INTERVAL '1 DAY'

For example, the miner logic for January 23 will be:

  • Jan 22 5 p.m. <= Jan 23 00:00 - 1 day
  • Jan 22 5 p.m. <= Jan 22 00:00

The miner will not fetch the data for the previous day (January 22) on the current day (January 23). Atlan requires a minimum lag of 24 to 48 hours to capture all the relevant transformations that were part of a session.

Causes of missing assets

There are several reasons why assets may be missing:

Workflow ordering

The order of operations you run in Atlan is important. To have lineage across tools, you need to:

  1. Crawl data stores first.
  2. Mine query logs (and dbt) second.
  3. Crawl BI tools last.

If you've used a different order, the upstream assets (data stores) may not yet exist when you load the BI metadata. Then you can have lineage within the BI metadata, but not between the BI metadata and the data sources.

If that's the case for you, don't worry. Re-run your existing workflows in the order above and Atlan should resolve it.

Crawling filters

Another reason lineage may have gaps is that linking assets do not exist, even after re-running the crawler.

When crawling a source, you can specify filters on which metadata to include and exclude. If you've excluded metadata needed to link assets into lineage, then end-to-end lineage will have gaps.

Check that you have not excluded any of the asset(s) you're expecting to be in lineage. (And remember that using an include filter means that not all metadata is being crawled β€” some is being excluded.)

If in doubt, try running your workflows without include or exclude filters.

Source permissions

Atlan is not the only place where you can filter metadata.

Atlan accesses your sources through credentials you provide. Those credentials have assigned permissions controlling what (meta)data they can access in the source. If those permissions prevent access to some (meta)data, then Atlan cannot crawl that metadata.

So if ordering and filter don't fix the problem, check your source permissions. Are they providing access to all the data assets you need for lineage?

Different connections, same source

We currently do not resolve lineage across different connections for the same source. You need to crawl (and mine) all assets from a given source through the same connection to generate lineage.

🚨 Careful! This one is the most subtle of the causes. The assets may even appear to be in the environment in this case. Check the qualifiedName of the asset matches exactly what lineage expects.

Temporary tables

If your data processing tool uses temporary tables, Atlan can still support generating lineage accurately.

For example:

  • Table A -> temporary table -> table B
  • Lineage will be represented as table A -> table B in Atlan.

In this case, Atlan assumes that tables A and B are present in Atlan. However, if table A is missing, then Atlan will not be able to generate lineage.

Cross-connection links

If the combination of database, schema, and table name for an asset is the same across different connections, it is possible that Atlan may create unexpected links for these assets.

For example, if your Production environment has the same set of databases, schemas, and tables as your Staging environment and both these source systems are crawled, Atlan may connect BI reports to either of these assets due to the name-match algorithm.

Lineage persistence

Lineage in Atlan is reflective of the last valid set of transformations performed for a particular target table in the external (source) system. Atlan retains these transformations as lineage and does not auto-delete or sunset the process entities (links).

The exception to this rule is when new information pertaining to the same target table is inferred in the latest job run. In this case, Atlan will replace the previous links with new ones.

Related articles

Was this article helpful?
1 out of 1 found this helpful