How to crawl Microsoft Azure Cosmos DB

Once you have configured the Microsoft Azure Cosmos DB permissions, you can establish a connection between Atlan and Microsoft Azure Cosmos DB.

To crawl metadata from Microsoft Azure Cosmos DB, review the order of operations and then complete the following steps.

Select the source

To select Microsoft Azure Cosmos DB as your source:

  1. In the top right of any screen in Atlan, navigate to +New and click New workflow.
  2. From the Marketplace page, click Cosmos DB Assets.
  3. In the right panel, click Setup Workflow.

Provide your credentials

Choose your authentication method:

SCRAM-SHA authentication

To enter your Microsoft Azure Cosmos DB credentials:

  1. For Database API, MongoDB is the default selection.
  2. For Extraction method, Direct is the default selection.
  3. For Connection String, enter the primary connection string you copied from your Microsoft Azure Cosmos DB deployment.
  4. Click the Test Authentication button to confirm connectivity to Microsoft Azure Cosmos DB.
  5. Once authentication is successful, navigate to the bottom of the screen and click Next.

Service principal authentication

To enter your Microsoft Azure Cosmos DB credentials:

  1. For Database API, MongoDB is the default selection.
  2. For Extraction method, Direct is the default selection.
  3. For Host, enter the publicly accessible URL of your Microsoft Azure Cosmos DB deployment.
  4. For Port, enter the port number where your Microsoft Azure Cosmos DB deployment is publicly accessible.
  5. For Client ID, enter the application (client) ID you copied for your service principal.
  6. For Client Secret, enter the client secret you copied for your service principal.
  7. For Tenant ID, enter the directory (tenant) ID you copied for your service principal.
  8. For Resource Group, enter the resource group you copied from your Microsoft Azure Cosmos DB deployment.
  9. For Subscription ID, enter the subscription ID you copied from your Microsoft Azure Cosmos DB deployment.
  10. For Cosmos DB Account Name, enter the Cosmos DB account name you copied from your Microsoft Azure Cosmos DB deployment.
  11. Click the Test Authentication button to confirm connectivity to Microsoft Azure Cosmos DB.
  12. Once authentication is successful, navigate to the bottom of the screen and click Next.

Configure the connection

To complete the Microsoft Azure Cosmos DB connection configuration:

  1. Provide a Connection Name that represents your source environment. For example, you might use values like production,development,gold, or analytics.
  2. (Optional) To change the users who are able to manage this connection, change the users or groups listed under Connection Admins.
    🚨 Careful! If you do not specify any user or group, no one will be able to manage the connection β€” not even admins.
  3. Navigate to the bottom of the screen and click Next to proceed.

Configure the crawler

Before running the Microsoft Azure Cosmos DB crawler, you can further configure it.

On the Metadata page, you can override the defaults for any of these options:

  • To select the assets you want to exclude from crawling, click Exclude Metadata. (This will default to no assets if none are specified.)
  • To select the assets you want to include in crawling, click Include Metadata. (This will default to all assets, if none are specified.)
  • To have the crawler ignore collections based on a naming convention, specify a regular expression in the Exclude regex for collections field.
  • For Extract Collection Schemas, change to Yes to enable Atlan to extract collection schemas by reading a subset of the documents in the collection. For Schema extraction sample size, you can set a custom value of up to 1,000 for documents to be read for schema analysis.
πŸ’ͺ Did you know? If an asset appears in both the include and exclude filters, the exclude filter takes precedence.

Run the crawler

To run the Microsoft Azure Cosmos DB crawler, after completing the steps above:

  • To run the crawler once immediately, at the bottom of the screen, click the Run button.
  • To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule & Run button.

Once the crawler has completed running, you will see the assets on Atlan's asset page! πŸŽ‰

Related articles

Was this article helpful?
0 out of 0 found this helpful