How to set up dbt Core

Option 1: Use the Atlan S3 bucket

To avoid access issues, we recommend uploading the required files — manifest.json and run_results.json — to the same S3 bucket as Atlan. Raise a support request to get the details of your Atlan bucket and include the ARN value of the IAM user or IAM role we can provision access to.

Create IAM policy

You will need to create an IAM policy and attach it to the IAM user or role to upload the required files to your Atlan bucket. To create an IAM policy with the necessary permissions, follow the steps in the AWS Identity and Access Management User Guide.

Create the policy using the following JSON:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:PutObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
        "s3:GetBucketLocation",
      ],
      "Resource": [
        "arn:aws:s3:::<bucket_name>/*",
      ]
      "Effect": "Allow",
    },
    {
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<bucket_name>",
      ],
      "Effect": "Allow"
    }
  ]
}
  • Replace <bucket_name> with the name of your Atlan bucket.

If you instead opt to use your own S3 bucket, you will need to complete the following steps:

Option 2: Use your own S3 bucket

🚨 Careful! S3 buckets with VPC endpoints currently do not support cross-region requests. This may result in workflows not picking up objects from your bucket. Atlan also recommends disabling ACLs on your S3 bucket when using this method. Having ACLs enabled may prevent the bucket owner from accessing the stored objects.

You'll first need to create a cross-account bucket policy giving Atlan's IAM role access to your bucket. A cross-account bucket policy is required since your Atlan tenant and S3 bucket may not always be deployed in the same AWS account. The permissions required for the S3 bucket include — GetBucketLocation, ListBucket, and GetObject.

To create a cross-account bucket policy:

  1. Raise a support ticket to get the ARN of the Node Instance Role for your Atlan EKS cluster.
  2. Create a new policy to allow access by this ARN and update your bucket policy with the following:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "VisualEditor0",
          "Effect": "Allow",
          "Principal": {
            "AWS": "<role-arn>"
          },
          "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucket",
            "s3:GetObject"
          ],
          "Resource": [
            "arn:aws:s3:::<bucket-name>",
            "arn:aws:s3:::<bucket-name>/<prefix>/*"
          ]
        }
      ]
    }
    • Replace <role-arn> with the role ARN of Atlan's node instance role.
    • Replace <bucket-name> with the name of the bucket you are creating.
    • Replace <prefix> with the name of the prefix (directory) within that bucket where you will upload the files.
  3. Once the new policy has been set up, please notify the support team. Your request should include the S3 bucket name and prefix. This should be done prior to setting up the workflow so that we can create and attach an IAM policy for your bucket to Atlan's IAM role.

(Optional) Update KMS policy

If your S3 bucket is encrypted, you will need to update your KMS policy. This will allow Atlan to decrypt the objects in your S3 bucket.

  1. Provide the KMS key ARN and KMS key alias ARN to the Atlan support team.
  2. To whitelist the ARN of Atlan's node instance, update the KMS policy with the following:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "Decrypt Cross Account",
          "Effect": "Allow",
          "Principal": {
            "AWS": "<role-arn>"
          },
          "Action": [
            "kms:Decrypt",
            "kms:DescribeKey"
          ],
          "Resource": "*"
        }
      ]
    }
    • Replace <role-arn> with the role ARN of Atlan's node instance role.

Structure the bucket

Multiple projects

Atlan supports extracting dbt metadata from multiple dbt projects. You need to use one of the following structures:

Environment-inclusive Without an environment
main-prefix
  • environment1
    • project1
  • environment2
    • project2
    • project3
  • environment3
    • project4
    • project5
main-prefix
  • project1
  • project2
  • project3
  • project4
  • project5

Both examples will be processed as five different dbt projects. The base folder name (for example, project2) will be stored as Project Name in the dbt metadata.

Single project

For a single dbt project you can directly upload files in the main S3 prefix or inside another folder that has the dbt project name.

Project-inclusive Without a project
main-prefix
  • project1
main-prefix

Upload project files

Upload the following files from the target directory of the dbt project into one of the bucket structures outlined above:

  • manifest.json, which you can generate by running:
    dbt compile --full-refresh
    • This single file contains a full representation of your dbt project's resources, including models, tests, macros, node configurations, resource properties, and more.
  • catalog.json, which you can generate by running:
    dbt docs generate
    • This file contains metadata about the tables and views produced by the models in your dbt project — for example, column data types and table statistics.
  • run_results.json, which you can generate by running:
    dbt test
    • This file contains information about a completed invocation of dbt, including timing and status details for each node — such as model, test, and more — that was executed.
🚨 Careful! To crawl dbt tests, you need to upload the run_results.json file. We recommend uploading the file to the same folder as the manifest.json file.

Related articles

Was this article helpful?
2 out of 2 found this helpful