The order of operations you run in Atlan is important. Follow the specific workflow sequence outlined below when crawling data tools. The right order particularly ensures that lineage is constructed without needing to rerun crawlers.
Order of operations
To have lineage across tools, you need to:
- Crawl data stores first — for example, SQL data sources, NoSQL data sources, event buses, and schema registries.
- Run data quality tools — for example, Monte Carlo and Soda.
- Mine query logs — mine queries through S3 or run miner packages for supported sources.
- Run extract-load tools — for example, Fivetran, Airflow/OpenLineage and other supported distributions, and data processing tools like Apache Spark/OpenLineage.
- Run transformation tools — for example, dbt and Matillion.
- Crawl business intelligence tools last — for example, supported BI tools like Looker, Microsoft Power BI, Tableau, and more.
If you use a different order, the upstream assets (data stores) may not yet exist when you load the BI metadata. Then you may have lineage within the BI metadata, but not between the BI metadata and data sources. If that's the case for you, don't worry. Rerun your existing workflows in the order above and Atlan should resolve it.
Let's review some general guidelines and best practices for running workflows in Atlan:
- Schedule your workflows based on how often you want your metadata in Atlan to be updated — weekly, monthly, and so on. To configure custom cron schedules, learn more here.
- Avoid any overlaps between workflow schedules to ensure consistent workflow run times.
- Remember that the first workflow run can typically take much longer than subsequent runs. The first run establishes the connection, queries the source, extracts and transforms the metadata, and then publishes your assets for the first time in Atlan.
- If running a miner for the first time, set a start date around 3 days prior to the current date and then schedule it daily to build up to two weeks of query history. Mining two weeks of query history on the first miner run may cause workflows to time out or hit resource consumption errors.
- For all subsequent miner runs, Atlan requires a minimum lag of 24 to 48 hours to capture all the relevant transformations that were part of a session. Learn more about the miner logic here.
- Run preflight checks before running the crawler to check for any permissions or other configuration issues, including testing authentication.
Here are a few tips to help you troubleshoot workflow failures in Atlan:
- If test authentication or preflight checks fail, check the source to ensure that your credentials are correct and you have requisite access to crawl the metadata.
- If you're connecting to Atlan via private link and experience any network-related errors or timeouts during test authentication, it may mean that there is a network connectivity issue between the source and Atlan. Reach out to Atlan support to help you investigate further.
- If both test authentication and preflight checks fail and succeed intermittently when tried multiple times, this may mean that your cluster is in an unstable state and needs to be restarted. Notify Atlan support to restart your cluster.