What is data loss prevention, and why is it needed?

Data loss prevention (DLP) software is just one component of a larger data protection strategy.

November 27, 2024

• 7 min read

Sitting on troves of sensitive data? It’s not enough to just protect against external attackers—insider threats, configuration mistakes, human error, and data-scraping apps can all result in that information ending up where it shouldn’t.

Data loss prevention (DLP) refers to the tools and processes that organizations use to ensure information stays within its intended guardrails at potential egress points. That could mean ensuring that sensitive or protected data is only accessible to authorized parties, or isn’t inadvertently mixed into other data sets—or it could mean preventing a user from copying proprietary information onto a USB drive, or sending it to an external party via email.

DLP software is important for compliance with regulations concerning personal, health, and financial data, as well as shielding intellectual property. It also helps IT and compliance staff gain insight into how data circulates through organizations. To operate properly, DLP tools need to be able to classify, monitor, and control data, as well as cover all the places it could be stored or transmitted. Today, that doesn’t just mean endpoints and networks, but SaaS and the cloud.

For example, DLP software might intervene and prohibit a user from copying strings of protected data from one file to an unauthorized destination. To do so, DLP software must be able to hook into operating systems, network monitoring tools, and other software. While some vendors rely on distributed computation to handle the associated workloads, others offer cloud-based tools.

Jaimen Hoopes, security firm Forcepoint’s VP of product development and GM of data security, told IT Brew that virtually every industry has some use case for DLP software, especially regulated industries that can be audited for data privacy compliance.

“No matter how small or large you are, even from the little veterinary clinics up to the largest enterprises in the world, you have some amount of data that you have to protect,” Hoopes said.

Must-have features

There are five phases to DLP, according to Hoopes: discovery, classification, protection, monitoring, and training.

Discovery and classification are the processes of inventorying data across the organization and its devices, ranging from databases to PCs and mobile devices, and categorizing it with technologies like exact data matching (EDM) and indexed document matching (IDM), optical character recognition, and machine learning. Protection and monitoring concerns creating policies around data access and scanning to ensure those aren’t violated. Staff also have to be trained on those policies and the restrictions introduced by software tools.

DLP software is just one component of a larger data loss prevention strategy, according to Salah Nassar, cloud security firm Zscaler’s senior director of product marketing.

Nassar told IT Brew organizations have to protect their data across at least five vectors: cloud, bring your own device (BYOD), endpoints, SaaS, and data in transit. Effective programs involve more than just detecting sensitive data, but understanding the context in which data is restricted, accessed, and used throughout an organization.

DLP is “a technology, but it’s only as good as what it can see, and where it’s been deployed,” Nassar said. While DLP technology initially relied on detecting files marked as sensitive, or which had certain keywords, it has now been complicated by technologies like encryption and the proliferation of data in cloud services.

“Your world went from [an] on-prem data center network to, my data is in the cloud, multiple clouds—Azure, AWS, Google, and then a ridiculous amount of SaaS platforms that also include AI applications now,” Nassar said. As a result, DLP software now has to be able to detect and control data spat out by automated processes rather than users, as well as in multimodal formats like images, video, or machine-generated databases.

“You really cannot apply data protection without a layered approach,” Nassar said.

“You can’t hire an army of 100 people to look at tens of thousands of incidents a day,” he added. “You need an intelligent solution that correlates all these events from a behavioral perspective, from a threat perspective, and lets you know as an admin the general posture of your organization is X, and these are the top issues that you should be dealing with.

Guarding against AI

Large language models (LLMs) like OpenAI’s ChatGPT, Google Gemini, and Microsoft Copilot pose new challenges, according to Hoopes, because these apps can sometimes bypass role-based access controls (RBAC) or allow access to sensitive data stored in cloud services.

Storage services like OneDrive or collaboration tools like SharePoint are a “well-vetted, mostly secure area where you can have data partitioned by department and user,” Hoopes said. “You can set up these access controls, and if you want to share a file, you explicitly share a file—great RBAC in place.”

“As soon as that file is shared with an LLM…that RBAC is gone,” Hoopes added. “Even inside Copilot, which is a Microsoft product, anybody could ask Copilot about that file, and they’d get an answer back.”

DLP is one way to prevent that from happening, but users could also rely on complementary tools like a data security posture management solution to fix missing labels, Hoopes advised.

“The ability to look at and secure communication between a user and an application has dramatically changed, because now it’s conversational,” Nassar said, warning users should assume all SaaS now has integrated AI that learns from user behavior.

“Organizations, from a data loss perspective, have to have the right type of contracts with your SaaS vendors to understand what they’re learning about your environment and what you can police going to their environment,” Nassar added.

What is a DLP audit?

DLP tools also often include auditing functions to assess effectiveness, whether concerning industry standards like payment card industry (PCI) compliance, or regulations like the European Union’s GDPR.

According to Hoopes, some Forcepoint customers are required to pass annual audits. That involves running discovery and classification tools across data repositories to create indexes of where sensitive data is located. For example, an auditor might ask a credit card company where all card data is supposed to be saved, then have them run a query to identify whether that data is in any other locations and track potential policy violations.

“Then you can run an incident report about that data,” Hoopes added. “Who tried to move it? Was it okay? Was it not? If not, what did you do? How did you escalate it, and did you follow all the proper procedures?”

Nassar said that Zscaler’s audit dashboard, for example, can scan files to identify if users are properly tagging documents, which users are accessing the most sensitive data, and the destinations of data that’s transferred to another location. Other automated audits include environmental posture and data governance assessments, as well as compliance with data privacy regulations.

“Some of these audits, in a way, are kind of executive reports,” Nassar said. “And some of them are meant for the admin so they can respond.”

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.