How to crawl dbt

Once you have configured a dbt Cloud service token or uploaded your dbt Core project files to S3, you can crawl dbt metadata into Atlan.

To enrich metadata in Atlan from dbt, review the order of operations and then complete the following steps.

Select the source

To select dbt as your source:

  1. In the top right of any screen, navigate to New and then click New Workflow.
  2. From the list of packages, select dbt Assets and then click Setup Workflow.

Provide your credentials

dbt Core

To enter your dbt Core credentials:

  1. For Extraction method, click Core.
  2. Enter the details for the S3 location of your project files:
    • For S3 bucket name, enter the name of your S3 bucket or Atlan's bucket containing your project files. Do not include the s3://.
    • For S3 prefix, enter the path of the prefix in the S3 bucket up to, but not including, the project name.
    • (Optional) For S3 region, enter the name of the S3 region in which the bucket exists.
      🚨 Careful! Your S3 bucket must be in the same region as Atlan. S3 buckets with VPC endpoints currently do not support cross-region requests.
  3. (Optional) To specify the metadata to include or exclude, for Advanced options, select Yes:
    1. For Exclude filter pattern, enter an AWS filter pattern to exclude folders from being crawled. This will be evaluated first.
    2. For Include filter pattern, enter an AWS filter pattern to include folders to crawl.
  4. Navigate to the bottom of the screen and click Next.

dbt Cloud

To enter your dbt Cloud credentials:

  1. For Extraction method, click Cloud.
  2. For Host Name, enter the domain name of your dbt Cloud instance, if not the default. Include the https://. For more information on access URLs, refer to dbt documentation.
  3. For Authentication Type, Service Account is the default selection for service account token. Change to PAT to enter a personal access token (PAT) instead.
  4. For Token, enter the dbt Cloud token you generated.
  5. Click the Test Authentication button to confirm connectivity to dbt Cloud using these details.
  6. Once authentication is successful, navigate to the bottom of the screen and click Next.
💪 Did you know? If a project appears in both the include and exclude filters, the exclude filter takes precedence.

Configure the connection

To complete the dbt connection configuration:

  1. Provide a Connection name that represents your source environment. For example, you might use values like production,development,gold, or analytics.
  2. (Optional) To change the users who are able to manage this connection, change the users or groups listed under Connection Admins.
    🚨 Careful! If you do not specify any user or group, no one will be able to manage the connection — not even admins.
  3. Navigate to the bottom of the screen and click Next to proceed.

Configure the crawler

Before running the dbt crawler, you can further configure it.

dbt Core

On the Configuration page for dbt Core, you can override the defaults for any of these options:

  • To limit the enrichment to a particular connection with materialized assets, click Connection and select the relevant option. (This will default to all connections, if none are specified.)
  • To import existing tags from dbt to Atlan, for Import Tags, click Yes.
  • For Advanced options, click Yes to configure the crawler further:
    • For Enrich Metadata in Materialized Assets, click Yes to allow enrichment for both dbt and materialized assets or No for dbt assets only.

dbt Cloud

On the Configuration page for dbt Cloud, you can override the defaults for any of these options:

  • To select the dbt projects and environments you want to exclude from crawling, click Exclude Metadata. (This will default to no projects, if none are specified.)
  • To select the dbt projects and environments you want to include in crawling, click Include Metadata. (This will default to all projects, if none are specified.)
  • To limit the enrichment to a particular connection with materialized assets, click Connection and select the relevant option. (This will default to all connections, if none are specified.)
  • To import existing tags from dbt to Atlan, for Import Tags, click Yes.
  • For Advanced options, click Yes to configure the crawler further:
    • For Enrich Metadata in Materialized Assets, click Yes to allow enrichment for both dbt and materialized assets or No for dbt assets only.

Run the crawler

To run the dbt crawler, after completing the steps above:

  1. To check for any permissions or other configuration issues before running the crawler, click Preflight checks — currently only supported for dbt Core when using an S3 bucket. If you're using a Google Cloud Storage or an Azure Data Lake Storage bucket, skip to step 2.
  2. You can either:
    • To run the crawler once immediately, at the bottom of the screen, click the Run button.
    • To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule Run button.

Once the crawler has completed running, you will see the assets on Atlan's asset page! 🎉

Related articles

Was this article helpful?
1 out of 1 found this helpful