What is Data Discovery & Classification?

How can databases be classified?

What are the limitations of this process?

What are the solutions?

Classification is done, now what next?

What is Data Discovery & Classification?

Data Discovery & Classification is a feature built into Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. It provides basic capabilities for discovering, classifying, labeling, and reporting the sensitive data in your databases at no additional cost.

Your most sensitive data might include business, financial, healthcare, or personal information. It can serve as infrastructure for:

•Helping to meet standards for data privacy and requirements for regulatory compliance.

•Various security scenarios, such as monitoring (auditing) access to sensitive data.

•Controlling access to and hardening the security of databases that contain highly sensitive data.

Data Discovery & Classification currently supports the following capabilities:

Discovery and recommendations: The classification engine scans your database and identifies columns that contain potentially sensitive data. It then provides you with an easy way to review and apply recommended classification via the Azure portal.

Labeling: You can apply sensitivity-classification labels persistently to columns by using new metadata attributes that have been added to the SQL Server database engine. This metadata can then be used for sensitivity-based auditing scenarios.

Query result-set sensitivity: The sensitivity of a query result set is calculated in real time for auditing purposes.

Visibility: You can view the database-classification state in a detailed dashboard in the Azure portal. Also, you can download a report in Excel format to use for compliance and auditing purposes and other needs.

Default Classification Labels and Types.

The classification includes two metadata attributes:

•Labels: The main classification attributes, used to define the sensitivity level of the data stored in the column.

1.Public

2.General

3.Confidential

4.Confidential – GDPR

5.Highly Confidential

6.Highly Confidential – GDPR

•Information types: Attributes that provide more granular information about the type of data stored in the column.

Banking                                                Health

•National ID                                          Credentials  

•Networking                                          Credit card

•SSN                                                      Date of birth

•Financial                                               Other

•Name                                              Contact Info

How can databases be classified?

The default data classification information types and labels are stored in SQL Information Protection Policy file. When we check for recommendations for classifications or want to apply them, it refers to this file. Microsoft does provide the ability for database administrators to modify this policy (labels and information types) and any changes done will reflect in the data classifications. But we will talk about this a little later and its use-cases.

Classifications can be managed via 2 methods:

T-SQL
Powershell cmdlets

From an automation perspective, when we need to apply the classifications for hundreds of databases or more the Azure PowerShell cmdlets come very handy, and I will focus on that.

We can use PowerShell Cmdlet to add or remove column classifications, as well as retrieve all classifications and get recommendations for the entire database.

Get-SqlSensitivityClassification
Get-SqlSensitivityRecommendations
Set-SqlSensitivityClassification
Remove-SqlSensitivityClassification

Here is a code snippet which loops through all subscriptions under a tenant, finds all resources of type- SQL Server and then loops through all databases (based on certain filters) and then first gets the recommended classifications and then applies them via the above Az PowerShell cmdlets.

$AzureSQLClassStatus1 = @()

$subscriptions = Get-AzSubscription
foreach ($sub in $subscriptions)
{
    $sub | Set-AzContext | Out-Null
    $AzureSQLServers =Get-AzResource |Where-Object ResourceType -EQ Microsoft.SQL/servers

foreach ($AzureSQLServer in $AzureSQLServers)
{
    $AzureSQLServerDataBases = Get-AzSqlDatabase -ServerName $AzureSQLServer.Name -ResourceGroupName $AzureSQLServer.ResourceGroupName | Where-Object {($_.DatabaseName -ne "master") -and ($_.DatabaseName -like "*PROD")} | Where-Object {$_.SecondaryType -ne "Geo"}
        foreach ($AzureSQLServerDataBase in $AzureSQLServerDataBases) 
        {
                $svName =$AzureSQLServerDataBase.ServerName
                $DBName =$AzureSQLServerDataBase.DatabaseName
                $ResourceGroupName= $AzureSQLServer.ResourceGroupName

Write-Output "Applying Classifications for database:" $DBName
$classifications=Get-AzSqlDatabaseSensitivityRecommendation -ResourceGroupName $ResourceGroupName -ServerName $svName -DatabaseName $DBName |Set-AzSqlDatabaseSensitivityClassification
$appliedclassifications=Get-AzSqlDatabaseSensitivityClassification -ResourceGroupName $ResourceGroupName -ServerName $svName -DatabaseName $DBName

foreach  ($class in $appliedclassifications) 
       {  
            $AzureSQLClassStatus = [PSCustomObject]@{

                'ResourceGroupName' = $ResourceGroupName
                'Database Server' = $svName
                'Database Name' = $DBName
                'Sensitivity Labels'=(@($class.SensitivityLabels) -join ',')
            }
            
 $AzureSQLClassStatus1 += $AzureSQLClassStatus
        }
           
      }
}

https://learn.microsoft.com/en-us/azure/defender-for-cloud/sql-information-protection-policy?tabs=sqlip-tenant#customize-your-information-types

Get-AzSqlDatabaseSensitivityClassification -ResourceGroupName resourceGroup -ServerName server -DatabaseName database | Remove-AzSqlDatabaseSensitivityClassification

DROP SENSITIVITY CLASSIFICATION FROM schema.Tablename.Columnname

<sensitivity_attributes max_rank="20" max_rank_desc="Medium"><sensitivity_attribute label="Confidential" label_id="bac40f0b-cd20-4e1b-8bb5-3a6dc74a7d85" information_type="Credentials" information_type_id="c64aba7b-3a3e-95b6-535d-3bc535da5a59" rank="20" rank_desc="Medium"/></sensitivity_attributes>

<sensitivity_attributes max_rank="20" max_rank_desc="Medium"><sensitivity_attribute label="Confidential - GDPR" label_id="7097d922-0870-4afb-b33b-1db32f3e6eed" information_type="Name" information_type_id="57845286-7598-22f5-9659-15b24aeb125e" rank="20" rank_desc="Medium"/></sensitivity_attributes>