How to automatically classify PII assets in Atlan

🏷️ What is automated PII classification?

Classifications (also known as “tags”) help you group data assets under the same access policies. An example of a classification is PII — Personal Identifiable Information. This includes information like names, addresses, and credit card info.

It’s important to identify and control access to PII data in your organization. However, it can be challenging and time-consuming to tag PII data assets appropriately! Instead of your team having to manually tag thousands of data assets as PII, Atlan supports automated PII classification.

Atlan’s automated PII classification is able to intelligently identify data assets with PII data, and then attach the PII classification and subsequent access policies regarding PII data to those assets.

🤖 How does auto-PII classification work?

Atlan reads and automatically tags columns with PII data.

It first checks the column metadata (like column headers) against our master database for internal PII terms like name, email address, age, bank account number, etc. If the matching score (using Levenshtein distance) of the column header and any PII term in the master data is above a certain threshold value, then it gets auto-tagged with a PII classification.

Access policies can be built out on top of this classification, allowing organizations to restrict access to and maintain controls over all PII data.

🛠️ How to add PII classification while setting up new workflows

As one of the final steps when setting up a connection in Atlan to a source (such as Snowflake, Databricks, etc.), you’ll need to specify how to crawl metadata from the selected data source. At this point, you can choose to turn on “Auto-Classification”.

If you select “Auto-Classification” it means when Atlan crawls assets from this source, it will automatically classify and tag any PII data in those assets. 🥳


🎉 Congratulations! You’ve now set up auto-classification of PII data.

