Anonymise, encrypt, control access and assess risk - a security pro's four-point security checklist for big data

MWR InfoSecurity's Guillermo Lafuente describes the main steps organisations need to take in order to secure their big data systems

Big data presents huge opportunity for organisations, but securely storing that data can be a challenge for organisations that may know that they have an obligation to customers and clients to keep their data safe and protected from a data breach, but may not know exactly what steps they should be taking.

When producing information for big data, organisations have to ensure that they have the right balance between usefulness of the data and privacy. This follows a process of anonymising the data, encrypting it, putting proper access control in place with security monitoring, risk assessment and making sure storage complies with local regulations.

Anonymise

Before the data is stored, it should be adequately anonymised. This involves removing any unique identifiers for a user attached to that piece of data. This, in itself, can be a security challenge, as removing unique identifiers might still not be enough to guarantee that the data will remain anonymous. The anonymised data could be cross-referenced with other available data following de-anonymisation techniques. Therefore, it should also be encrypted.

Encrypt

Both the raw data and the outcome from analytics should be sufficiently protected with encryption. However, in the case of cloud services, data cannot be sent encrypted by the users if the cloud needs to perform operations over the data. A solution for this is to use "fully homomorphic encryption" (FHE), which enables data stored in the cloud to perform operations over the encrypted data so that new encrypted data will be created.

In addition, protect communications: data in transit should be adequately protected to ensure its confidentiality and integrity.

Access control and security monitoring

Satisfactory access-control mechanisms will also be key to protecting the data. Access control has traditionally been provided by operating systems or applications restricting access to the information, which typically exposes all the information if the system or application is hacked. A better approach is to protect the information using encryption that only allows decryption if the person trying to access the information is authorised by an access-control policy.

One problem that may need to be overcome is that software commonly used to store big data, such as Hadoop, doesn't always come with user authentication by default. This makes the issue of access control trickier, as a default installation would leave the information open to unauthenticated users. By using real-time security monitoring, access to the data is monitored and threat intelligence applied in order to prevent unauthorised access to the data.

Risk assessment and compliance

Organisations should run a risk assessment over collected data and consider that if they collect customer information that should be kept private to establish adequate policies that protect the data and client privacy. They should also carefully account for regional laws around handling customer data, such as the EU Data Protection Directive, bearing in mind differing interpretations of EU directives in local laws.

If the data is shared with other organisations, then it needs to be considered how this is done. Deliberately released data that turns out to infringe on privacy can have a huge impact on an organisation from a reputational and economic point of view. Anyone using third-party cloud providers to store or process data needs to ensure that providers also comply with regulations.

The main challenge introduced by big data is how to identify sensitive pieces of information that are stored within the unstructured data set, so it is crucial to bear in mind that security is a process, not a product. Therefore, organisations using big data will need to introduce adequate processes and apply traditional information-lifecycle management that helps them balance effectively managing and protecting the data, as well as their customers' privacy.

Guillermo Lafuente is a security consultant at MWR InfoSecurity