Troubleshooting Databricks connectivity

πŸ’ͺ Did you know? We will refer to both SQL endpoint and interactive cluster as compute engine below.

Why does the workflow take longer than usual in the extraction step?

  • Certain Databricks runtime versions do not have an easy way to extract some metadata (for example partitioning, table_type, and format). Thus we must perform extra operations to retrieve these, resulting in slower performance.
  • If you are not already, you may want to try the Unity Catalog extraction method.

Why is some metadata missing?

  • We currently cannot extract some metadata from Databricks:
    Metadata JDBC REST API
    ViewCount and TableCount (on schemas) ❌ βœ…
    RowCount (on tables and views) ❌ ❌
    TABLE_KIND (on tables and views) ❌ ❌
    PARTITION_STRATEGY (on tables and views) ❌ ❌
    CONSTRAINT_TYPE (on columns) ❌ ❌
    Partition key (on columns) ❌ βœ…
    Table partitioning information βœ… ❌
  • We'll be exploring ways to bring this metadata into Atlan if Databricks supports extraction of the metadata.

Why doesn't my SQL work when querying Databricks?

Can I use Atlan when the Databricks compute engine is not running?

  • Atlan needs the Databricks compute engine to be running for two activities:
    • Crawling assets (normal and scheduled run)
    • Querying assets (including data previews)
  • If you do not need to perform the activities above, your experience shouldn't be affected.
  • In any other case, you'll get a downgraded experience on Atlan if the compute engine is not running. Queries won't work as expected and a scheduled workflow might fail after a couple of retries.
  • We recommend turning off the Terminate after x minutes of inactivity option in your cluster to avoid these problems. If you have this turned on, any of the above activities should trigger the cluster to come back online within about 30 seconds.

Why can't I see all the assets on Atlan that are available in Databricks?

Why is the test authentication taking so long?

  • Please check the state of the compute engine. It must be in a running state for all operations, including authentication.

What limitations are there with the REST API (Unity Catalog) extraction method?

  • We currently do not support schema-level filtering and retrieving table partitioning information.

Why has my workflow started to fail when it worked before?

How do I migrate to Unity Catalog?

  • Currently Unity Catalog is in a public preview state.
  • The Databricks team is working on an automated migration to Unity Catalog.
  • Currently you must migrate individual tables manually.

Related articles

Was this article helpful?
1 out of 1 found this helpful