💪 Did you know? We will refer to both SQL endpoint and interactive cluster as compute engine below.
Why does the workflow take longer than usual in the extraction step?
- Certain Databricks runtime versions do not have an easy way to extract some metadata (for example partitioning, table_type, and format). Thus we must perform extra operations to retrieve these, resulting in slower performance.
- If you are not already, you may want to try the Unity Catalog extraction method.
Why is some metadata missing?
- We currently cannot extract some metadata from Databricks:
Metadata JDBC REST API
RowCount(on tables and views)
TABLE_KIND(on tables and views)
PARTITION_STRATEGY(on tables and views)
❌ ❌ Partition key (on columns) ❌ ✅ Table partitioning information ✅ ❌
- We'll be exploring ways to bring this metadata into Atlan if Databricks supports extraction of the metadata.
Why doesn't my SQL work when querying Databricks?
- We currently support SparkSQL on Databricks runtime 7.x and above.
Can I use Atlan when the Databricks compute engine is not running?
- Atlan needs the Databricks compute engine to be running for two activities:
- Crawling assets (normal and scheduled run)
- Querying assets (including data previews)
- If you do not need to perform the activities above, your experience shouldn't be affected.
- In any other case, you'll get a downgraded experience on Atlan if the compute engine is not running. Queries won't work as expected and a scheduled workflow might fail after a couple of retries.
- We recommend turning off the Terminate after x minutes of inactivity option in your cluster to avoid these problems. If you have this turned on, any of the above activities should trigger the cluster to come back online within about 30 seconds.
Why can't I see all the assets on Atlan that are available in Databricks?
- Have you excluded the database or schema when crawling?
- Does the Databricks user you configured for crawling have access to these other assets?
Why is the test authentication taking so long?
- Please check the state of the compute engine. It must be in a running state for all operations, including authentication.
What limitations are there with the REST API (Unity Catalog) extraction method?
- We currently do not support schema-level filtering and retrieving table partitioning information.
Why has my workflow started to fail when it worked before?
- This can happen if the PAT you configured the workflow with has since expired.
- You will need to create a new PAT in Databricks, and then modify the workflow configuration in Atlan with this new PAT.
- If you are unable to update the PAT, pause the workflow and reach out to us.
How do I migrate to Unity Catalog?
- Currently Unity Catalog is in a public preview state.
- The Databricks team is working on an automated migration to Unity Catalog.
- Currently you must migrate individual tables manually.