The General Data Protection Regulation (GDPR) replaces the European Union’s Data Protection Directive and applies to all organizations that collect or process the personal data of any EU resident (regardless of whether the organization has a presence in the EU). The Regulation goes into effect on May 25, 2018, and companies are scrambling to come into compliance.

The GDPR lays out a large number of requirements in regard to the data to which it applies. These include (but are not limited to):

  • Providing information to the data subjects regarding collection of their data
  • Giving data subjects access to the data and a copy of the data in a commonly used format
  • Rectifying inaccurate or incomplete data, restricting processing and/or erasing data (under certain circumstances) upon request by data subjects
  • Securing the personal data and protecting its ongoing confidentiality, integrity, availability, and resilience

In order to do any of the above, the first step is to locate the data your organization has collected or stored or is processing, which is covered by the GDPR. That means identifying what data is personal data as defined by the Regulation and locating it in storage – including any copies of it. You also need to have the means to export that data, in a common file format, all while keeping it secure.

Organizations are realizing that this may not be as easy as it would seem at first glance. The key is to have a good data identification and classification system and to use available tools to help you implement it.

What is personal data/sensitive personal data

Before you can proceed with data classification, you need to understand the types of data to which the GDPR applies. As defined in Article 4 of the Regulation, personal data means any information relating to an identified or identifiable natural person (data subject); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

As you can see, with the phrase “both directly or indirectly,” this is a very broad definition. Some of the things that fall under this definition consist of information we might not normally think of as personal data, such as an IP address, mobile device ID, or a web cookie.

But that’s not all. To further complicate matters, not all forms of personal data are treated equally. Article 9 of the GDPR addresses a special category of personal data that is usually referred to as sensitive personal data. This type of data requires extra protection, and consists of data relating to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, and health or sex life.  Data relating to criminal offenses also gets special treatment under a separate provision in Article 10.

What isn’t personal data?

A new concept – and to some, a new word – that appears in the GDPR is pseudonymisation. This refers to applying encryption, hashing or other technological means to personal data to render it incapable of identifying a particular person without using additional information. The GDPR suggests pseudonymisation (and encryption) as specific recommended ways to protect personal data.

It is very important to understand that pseudonymized data is still classified as personal data under the GDPR, which means it’s still subject to most of the same requirements. However, there are some exceptions or relaxed requirements for pseudonymized data in regard to data breach notifications and more flexibility regarding conduction of data profiling without the consent of the data subject.

The only data that is not considered personal data under the Regulation is data that cannot in any way be tied to a specific individual. For the purposes of the GDPR, companies and organizations are not considered “persons” even though such entities may have that status under some laws, so information about such organizations probably would not fall under the “personal data” definition – but that’s true only if the organizations are large enough that you can’t infer any information about individuals within the organizations.

Where is the personal data?

Once you have a clear understanding of what personal data is (and isn’t), you can start to round it up. That may be more easily said than done. Digitized personal data can be found in many different places within your systems’ storage locations.  Databases are the obvious location for such data, but personal information can also be found in documents, spreadsheets, email messages, and other types of files.

How do you find this data when it may be hidden in a sea of other, non-personal data?  It’s certainly not feasible for a human to read every file to check it for personal data.  You need a way to automate the process with algorithms that are capable of recognizing personal data and differentiating it from other data.

How can you identify personal data?

Luckily, many types of personal data follow identifiable patterns. For instance, passport numbers, phone numbers, credit card numbers and so on will have a designated number of digits or characters. In the U.S., social security numbers follow the pattern of three digits, a dash, two digits, another dash, and four digits. However, we’re talking about the European Union, and that means we’re dealing with many different countries, and the patterns followed by some of the identifiers are different for each country.

These identifiers can include driver’s licenses, license plate numbers, VAT codes, heathcare identification numbers, and various other national ID numbers. Searching for all of these different patterns can present quite a challenge.

A powerful, comprehensive search technology is needed to ferret out all the personal data collected, stored and processed by your organization. Multiple search types will be necessary. Regular expressions can be used to locate sequences that follow the known patterns of various types of personal data (for example, numbers that match the number of digits for EU passports). Other algorithms can scan for key words that, combined with the number sequence, are indicative of a particular type of personal data, such as a country code or name.

Using built-in software tools and features

The software that you’re using to store and process your data most likely contains some tools that can be used to search for personal data. For example, Microsoft’s cloud services – Azure, Office 365, Enterprise Mobility + Security (EMS), Dynamics 365, Azure SQL Database and SharePoint – all include various search functions that can help you find personal data.  

To find personal data in Microsoft Azure, you can use Azure Active Directory, Azure Data Catalog, Power Query (for Hadoop clusters in Azure HDInsight), Azure Search, and other related tools.

Once you’ve found the personal data, you can use other tools such as Azure Information Protection to help implement data classification and apply persistent labels to the personal data. You can also use the REST API or annotate personal data that is registered with the Azure Data Catalog.

Office 365 customers can use content search in Advanced eDiscovery to track down and identify personal data across Exchange Online, SharePoint Online, OneDrive for Business and Skype for Business. Office 365 Labels can be used to classify personal data and apply encryption and access restrictions. Administrators can also use the Advanced Data Governance tool to automate data retention and deletion polices.

To find personal data on local computers and on-premises servers, you can use Windows Search, PowerShell, and other operating system features and functions.

Finding personal data in SQL databases can be done using the SQL language to query databases and to customize tools or services.

Microsoft Compliance Manager, in preview at the time of this writing, will help cloud services customers manage compliance from one interface.

Solutions to manage and protect personal data

Once personal data has been identified and classified, it must be managed and protected in accordance with the GDPR requirements. That’s where monitoring, management, and security software and services come in. GFI’s family of security solutions includeGFI LanGuard, for keeping the network secure and systems patched to prevent data breaches caused by vulnerability exploits; GFI EndPoint Security to reduce the risk of personal data leakage related to BYOD systems and portable storage devices; and GFI EventsManager to help you detect any suspicious activity that could signal a breach of personal data and nip it in the bud or make timely notifications as required by the GDPR.

Summary

Compliance with the GDPR and other government and industry regulations that are aimed at protecting personal data begins with locating and classifying that data – but that’s only the first step. Once the data has been identified, you must implement controls to protect its confidentiality and integrity, and ensure that the proper notifications are made in case of any breach.

Features and functionalities built into operating systems and cloud services can help you locate the data that needs protection, and the combination of built-in security features, appropriately configured, with advanced management and security solutions, can help you in your efforts to achieve compliance.