Troubleshooting Databricks connectivity

πŸ’ͺ Did you know? We will refer to both SQL endpoint and interactive cluster as compute engine below.

How to debug test authentication and preflight check errors?

Hostname resolution error

Provided Host name cannot be resolved via DNS, please check and try again.

  • The hostname you have provided cannot be resolved through DNS. Ensure that the hostname is correct.
  • Verify that the DNS settings have been configured properly.

Invalid client ID or secret

Provided Client ID is invalid, please check and try again.

  • The client ID or secret you have provided is either invalid or no longer working. Follow the steps for AWS or Azure setup to generate new credentials.

Invalid tenant ID

Provided tenant ID is invalid, please check and try again.

  • The tenant ID you have provided is incorrect.
  • Ensure that the tenant ID you have provided corresponds to the one in your Microsoft Entra ID application.

Unity Catalog not linked

Configured Databricks instance doesn't have Unity Catalog linked. Please choose JDBC extraction instead of REST API in Atlan.

Connection timeout

Failed to connect to Databricks (connection timed out). Please check your host and port and try again.

  • The connection to the Databricks instance has timed out.
  • Verify that the host and port are correct.
  • Ensure that no firewall rules or network issues are blocking the connection.

Invalid HTTP path

Provided HTTP path is invalid, please check and try again.

  • The HTTP path you have provided is invalid.
  • Ensure that the endpoint is properly configured and accessible, and the warehouse ID in the HTTP path is correct.

Invalid personal access token

PAT token is invalid, please check and try again.

  • The personal access token used for authentication is invalid.
  • Ensure that the token is valid and neither deleted nor expired.
  • You can also generate a new personal access token, if needed.

General connection failure

Unable to connect to the configured Databricks instance, please check your credentials and configs and then try again. If the problem persists, contact support@atlan.com.

  • Ensure that you have entered the host and port correctly.
  • Verify that the credentials for the connection are correct.
  • Ensure that your Databricks instance is properly configured and accessible.
  • If the problem still persists after verifying all of the above, contact Atlan support.

Why does the workflow take longer than usual in the extraction step?

  • Certain Databricks runtime versions do not have an easy way to extract some metadata (for example partitioning, table_type, and format). Thus we must perform extra operations to retrieve these, resulting in slower performance.
  • If you are not already, you may want to try the Unity Catalog extraction method.

Why is some metadata missing?

  • We currently cannot extract some metadata from Databricks:
    Metadata JDBC REST API
    ViewCount and TableCount (on schemas) ❌ βœ…
    RowCount (on tables and views) ❌ ❌
    TABLE_KIND (on tables and views) ❌ ❌
    PARTITION_STRATEGY (on tables and views) ❌ ❌
    CONSTRAINT_TYPE (on columns) ❌ ❌
    Partition key (on columns) ❌ βœ…
    Table partitioning information βœ… ❌
    BYTES, SIZEINBYTES (table size) ❌ ❌
  • We'll be exploring ways to bring this metadata into Atlan if Databricks supports extraction of the metadata.

Why doesn't my SQL work when querying Databricks?

Can I use Atlan when the Databricks compute engine is not running?

  • Atlan needs the Databricks compute engine to be running for two activities:
    • Crawling assets (normal and scheduled run)
    • Querying assets (including data previews)
  • If you do not need to perform the activities above, your experience shouldn't be affected.
  • In any other case, you'll get a downgraded experience on Atlan if the compute engine is not running. Queries won't work as expected and a scheduled workflow might fail after a couple of retries.
  • We recommend turning off theΒ Terminate after x minutes of inactivity option in your cluster to avoid these problems. If you have this turned on, any of the above activities should trigger the cluster to come back online within about 30 seconds.

Why can't I see all the assets on Atlan that are available in Databricks?

Why is the test authentication taking so long?

  • Please check the state of the compute engine. It must be in a running state for all operations, including authentication.

What limitations are there with the REST API (Unity Catalog) extraction method?

  • We currently do not support schema-level filtering and retrieving table partitioning information.

Why has my workflow started to fail when it worked before?

How do I migrate to Unity Catalog?

  • Currently Unity Catalog is in a public preview state.
  • The Databricks team is working on an automated migration to Unity Catalog.
  • Currently you must migrate individual tables manually.

Related articles

Was this article helpful?
1 out of 1 found this helpful