Almost every enterprise has some kind of legacy file server. You know, the one that stores all the information that has to be accessed by the various business groups. Great swaths of spreadsheets, presentations, photos, accounts, client info and (possibly!) some illegally shared media.
This unstructured data, grows rapidly and organically. Many larger enterprises have sensitive data, which may contain confidential and/or personal information, residing on file stores where there is an insufficient understanding of exactly what this data is and who is accessing it. The over-permissive nature of global directory groups such as the “everyone” group, means that there is little control about exactly where in the enterprise this sensitive data is written.
User entitlements to view certain groups and folders evolve over time. This entitlement is rarely reduced yet regular reviews of user entitlement by manual methods are time consuming and therefore generally ignored.
I usually recommend a phased approach to tackling each of the issues identified. The first phase is to identify data sensitive data, data owners and data that can be archived or deleted. The second phase is the more strategic process of applying policies to the classified data and controlling and auditing access to it.
Phase 1: Identification
The initial task of locating this sensitive data may be appear overwhelming given the size of a typical enterprise file server and that realisation that sensitive data could reside literally anywhere within it. Key to making this task more manageable is to reduce the overall amount of data in which there MIGHT be sensitive information by identifying the data which can be clearly classified as NOT sensitive. This can be extended further by identification of data which may be archived off potentially expensive enterprise storage and other data which no one is accessing at all. By categorising information in this way we are gradually narrowing down the portion of the data set which may possibly contain our sensitive information.
Categorisation by File Type
An important early step is to gain a high level overview of the file types that constitute the unstructured data set. This provides two important benefits:
- Quickly identifies files that would not contain sensitive information. Typically you could include PowerPoint and audio/video files in this category.
- Locates data which the business has no requirement to be centrally stored. Personal MP3s might constitute this category.
A data governance solution typically does this in a couple of ways. Firstly by reporting on the file types being accessed and the number of events on each and secondly by the locations of files based on their file extension.
Classification of known sensitive data
There are some basic criteria which can be used for data classification:
- Time criteria is the simplest and most commonly used where different type of data is evaluated by time of creation, time of access, time of update, etc.
- Metadata criteria as type, name, owner, location and so on can be used to create more advanced classification policy
- Content criteria which involve usage of advanced content classification algorithms are most advanced forms of unstructured data classification
Use of an automated classification framework (like Varonis DCF) provides visibility into the content of data across file systems and can be utilised to locate data which is easily classified as sensitive. For example you could conclude that any file that contains a personal ID number is considered personal information and should be protected in line with appropriate guidelines.
Equally, any documents previously classified by their file properties or by keywords (e.g. “Company Confidential”) can be quickly located. These are example of “quick scores” in the reduction of data with unknown classification. Other examples of easily classified data include files containing:
- Policy numbers
- Phone numbers
- Postal Codes
- Bank account numbers
- Credit card numbers
- Passport numbers
- Personal (out of domain) email addresses
Identification of inactive directories
Enterprise file stores typically contain vast amounts of data that is no longer in use and therefore stale. It’s very difficult to determine where that data resides, so it remains in expensive file systems, possibly exposed to risk due to excessive permissions. This task can be greatly reduced with an automated solution to identify and report on inactive directories which may then be archived pending deletion. In many cases the amount of server space reclaimed during this stage will, in itself pay for the capital outlay of such a solution.
Identify data owner by folder
Having been through the processes above, the subset of data that remains unclassified is substantially reduced. The next step would be to classify the remaining data by involving the data owners. This would be done at strategic levels within the folder hierarchy. Organisational data owners could be your biggest asset in the battle to identify and locate which data is sensitive and that which is not.
Data owners should be making decisions and taking responsibility and correctly classifying their data. Without a data owner that understands the sensitivity, importance and organisational context, data cannot be managed and protected by the right people.
By analysis of permissions and directory services it is possible to identify folders closest to the top of the hierarchy where permissions for business users have been explicitly applied. These folders should have assigned data owners. An audit trail of every open, create, move, modify and delete on the file system should be kept. By analyzing this data over time, it is possible to provide actionable business intelligence on the probable data owner of any folder.
Identify data stored in other locations
Managing the enterprise file servers is not the final solution for protecting unstructured data. There are other locations onto which data can be stored which will need to be identified and managed. Mail servers and content management solutions (like SharePoint) can be managed using similar technology.
However, arguably a more difficult problem to solve is that of sensitive documents residing on the local disks of your network endpoints. Laptop users are particularly liable to drag documents onto the desktop so they can access them offline. The business likely has little or no insight to this unsecured data and becomes at risk of faulty business processes. With the data classification rules now largely understood they can be utilised in locating this locally stored information.
DLP and similar endpoint protection solutions will map and locate sensitive data stored on workstations and laptops. This can run in the background with minimal impact on productivity saving valuable time and improving efficiency of the data discovery process. Typically such tools allow logging and reporting on the use of this locally stored data and as such should be implemented as part of the data protection program.
Phase 2: Control of data by sensitivity level
With the majority of the data set classified, it is time to work again with the data owners to assign a sensitivity level to the data. The level of sensitivity should differentiate between valuable information that carries a high level of risk and other information that may be sensitive but carries less risk if exposed or lost. Common practice stipulates the following levels:
- Confidential – Requires significant protection as disclosure may seriously harm the business
- Private – Associated with an individual to whom disclosure might not be in best interests
- Sensitive – Requires protection due to regulatory conditions
- Public – Information that is already public knowledge
Policy based on Sensitivity level
Policies should be designed to provide details on how to protect information at varying sensitivity levels. Consideration should be given to the following issues:
- Access control requirements
- Marking/meta-tagging of files
- Electronic distribution/transmission
- Storage requirements
- Retirement and disposal of outdated information
Organise & Restructure
With sensitivity level in mind the file stores may be organised and restructured. Data owners should be assigned at strategic levels in the folder hierarchy. These data owners will be the custodians of the information going forward and should be ultimately accountable that data residing under their jurisdiction is managed in line with current policy.
A continued program of data protection should be adhered to. Central to this is the periodic re-scanning, auditing and reporting of information residing in the unstructured data environments. Further consideration could be given to data residing on users’ local hard disks. An endpoint protection solution can be implemented to periodically discover locally stored, sensitive information based on the data classifications that are now defined.