How to crawl Microsoft Azure Cosmos DB

Once you have configured the Microsoft Azure Cosmos DB permissions, you can establish a connection between Atlan and Microsoft Azure Cosmos DB.

To crawl metadata from Microsoft Azure Cosmos DB, review the order of operations and then complete the following steps.

Select the source

To select Microsoft Azure Cosmos DB as your source:

  1. In the top right of any screen in Atlan, navigate to +New and click New workflow.
  2. From the Marketplace page, click Cosmos DB Assets.
  3. In the right panel, click Setup Workflow.

Provide your credentials

Choose your deployment method:

vCore deployment

To enter your Microsoft Azure Cosmos DB credentials:

  1. For Database API, MongoDB is the default selection.
  2. For Extraction method, Direct is the default selection.
  3. For Select the deployment types to crawl, click vCore.
  4. For Connection Strings, enter the primary connection string(s) you copied from your Microsoft Azure Cosmos DB account(s).
  5. Click the Test Authentication button to confirm connectivity to Microsoft Azure Cosmos DB.
  6. Once authentication is successful, navigate to the bottom of the screen and click Next.

RU deployment

To enter your Microsoft Azure Cosmos DB credentials:

  1. For Database API, MongoDB is the default selection.
  2. For Extraction method, Direct is the default selection.
  3. For Select the deployment types to crawl, click RU.
  4. For Client ID, enter the application (client) ID you copied for your service principal.
  5. For Client Secret, enter the client secret you copied for your service principal.
  6. For Tenant ID, enter the directory (tenant) ID you copied for your service principal.
  7. Click the Test Authentication button to confirm connectivity to Microsoft Azure Cosmos DB.
  8. Once authentication is successful, navigate to the bottom of the screen and click Next.

vCore and RU deployment

To enter your Microsoft Azure Cosmos DB credentials:

  1. For Database API, MongoDB is the default selection.
  2. For Extraction method, Direct is the default selection.
  3. For Select the deployment types to crawl, click vCore and RU.
  4. For Client ID, enter the application (client) ID you copied of the service principal for your RU-based account.
  5. For Client Secret, enter the client secret you copied of the service principal for your RU-based account.
  6. For Tenant ID, enter the directory (tenant) ID you copied of the service principal for your RU-based account.
  7. For Connection Strings, enter the primary connection string(s) you copied from your vCore-based account(s).
  8. Click the Test Authentication button to confirm connectivity to Microsoft Azure Cosmos DB.
  9. Once authentication is successful, navigate to the bottom of the screen and click Next.

Configure the connection

To complete the Microsoft Azure Cosmos DB connection configuration:

  1. Provide a Connection Name that represents your source environment. For example, you might use values like production,development,gold, or analytics.
  2. (Optional) To change the users who are able to manage this connection, change the users or groups listed under Connection Admins.
    🚨 Careful! If you do not specify any user or group, no one will be able to manage the connection — not even admins.
  3. Navigate to the bottom of the screen and click Next to proceed.

Configure the crawler

Before running the Microsoft Azure Cosmos DB crawler, you can further configure it.

On the Metadata page, you can override the defaults for the following:

  • For Extract Collection Schemas, change to Yes to enable Atlan to extract collection schemas by reading a subset of the documents in the collection and map them to column assets. For Schema extraction sample size, you can set a custom value of up to 1,000 for documents to be read for schema analysis.

Run the crawler

To run the Microsoft Azure Cosmos DB crawler, after completing the steps above:

  • To run the crawler once immediately, at the bottom of the screen, click the Run button.
  • To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule & Run button.

Once the crawler has completed running, you will see the assets on Atlan's asset page! 🎉

Related articles

Was this article helpful?
0 out of 0 found this helpful