Data Lakes Security: What, Why, How?

In today’s digital-first world, data is the new oil. The stream of incoming data has become unending and more businesses are sourcing data to boost their targeting. That’s why data repositories have now gained the status of being prized possessions.

With the size and demand of these repositories growing exponentially with every passing day, today we have diverse types of repositories available for storing Big Data.

As Exabytes of sensitive business data gets generated daily across the globe, organizations find it difficult to store their sensitive data centrally in an organised manner. Until the time they find a sustainable, concrete solution to store such humongous data in a meaningful manner, many organizations opt for storing the raw data in an as-is format in a centralized repository commonly known as ‘Data Lakes’.

Demystifying Data Lakes

Data Lakes are a form of data repositories that store Big Data in its raw and native format. These lakes store relational data like operational databases and data from line-of-business applications, and non-relational data like the one sourced from mobile apps, IoT devices, social media, etc. Through crawling, cataloging, and indexing, they give organizations the ability to understand the exact kind of data that is available in the Data Lake at any given point in time.

Data Lakes work just like the body of water they are named after – contain a resource that can be processed and packaged to quench a thirst. And just like water has become a precious resource, so has the data that is stored in Data Lakes. That is why safeguarding measures need to be implemented to preserve and conserve these Data Lakes.

Top Data Protection Challenges of Data Lakes

Just the way the exposed water in a lake is vulnerable to contamination and misuse, the unfiltered sensitive raw data in Data Lakes faces imminent threats of leaks, breaches and thefts in the absence of strong protective mechanisms.

When it comes to protecting sensitive data in Data Lakes, ‘Unauthorized Access’ and ‘Data Theft’ are widely considered as the top two security challenges.


  1. Unauthorised Access


If access to Data Lakes is unauthorized, the data inside it can be easily accessed by anyone leading to its misuse. Since this data includes sensitive personal information, corporate financial data, technical trade secrets, etc., securing it through breach-proof access control systems becomes pivotal to data security.

  1. Data Thefts

Even with proper access control mechanisms in place, the possibilities of data thefts cannot be ruled out, especially when it comes to sabotage attempts from company insiders. Which is why, as an industry best practice, multiple regulatory bodies across the world mandate the use of encryption technologies for safeguarding sensitive data.

When Data Lakes are compromised and data leaks lead to ransomware, it only serves to damage the reputation of the brand along with bringing a regulatory compliance foul. This can lead to a loss of trust from the brand’s target customers.

Take recent data leaks for example. In 2018, British Airways was fined £183 Million for data breach in which hackers stole half a million flyers’ personal data. In the same year, Marriott faced a massive $123 Million GDPR fine as the result of a major security breach where more than 339 million guest records were exposed to hackers and cyber criminals.

Similarly, Equifax had to pay at least $650 Million in the largest-ever data breach settlement to resolve most claims stemming from a 2017 data breach that exposed sensitive information of more than 147 million consumers.


How Thales Secures Data Lakes

With its robust bouquet of encryption, centralized key management, identity management with access controls, Thales offers end-to-end protection for securing sensitive data in Data Lakes.

Thales’s Vormetric Transparent Encryption comprehensively addresses Data Lake use-cases for encrypting sensitive data at the OS-level. Further, with strongly defined access controls for individual user groups, it efficiently maintains audit and trial logs to meet regulatory compliance mandates of PCI DSS, GDPR, RBI’s data protection mandates, etc.


With Vormetric Transparent Encryption, organizations can cohesively protect their sensitive data wherever it resides – on-premises, across multiple clouds, within Data Lakes and other container environments.

Designed to meet global data security compliance mandates, Vormetric Transparent Encryption seamlessly delivers data-at-rest encryption with centralized key management, privileged user access control and comprehensive data access audit logging.

To know more about Thales’s Vormetric Transparent Encryption, please visit


To Sum It Up

When it comes to data breaches, the question is not “if”, but “when”. With an ever-increasing volume of sensitive data streaming unimpeded in Data Lakes, mere perimeter defense systems don’t work as effectively as the last line of defense mechanisms like data encryption.