Understanding and Selecting Data Masking: Use Cases

As we approach the end of this series, it has become clear that I should really have started with use cases. Not only because they are the primary driver of interest in masking products, but also because many advanced features and deployment models really only make sense in terms of particular use cases. The critical importance of clustered servers, and the necessity for post-masking validation for some applications, are really only clear in light of particular usage scenarios. I will sort this out in the final paper, putting use cases first, which will help with the more complex later discussions. But here they are.

Use Cases

Test Data Management: This is, by far, the most important reason customers gave for masking. When polled, most customers say their #1 use for masking technologies is to produce test data. They want to make sure employees don’t do something stupid with corporate data, like making private data sets public, or moving production data to insecure test environments. That is technically true as far as it goes, but fails to capture the essence of what customers look for in masking products. In actuality, masking data for testing and sharing is almost a trivial subset of the full customer requirement; tactical production of test data is just a feature. The real goal is administration of the entire data security lifecycle – including locating, moving, managing, and masking data. The mature version of today’s simpler use case is a set of enterprise data management capabilities which control the flow of data to and from hundreds of different databases. This capability answers many of the most basic security questions we hear customers ask, such as “Where is my sensitive data?” “Who is using it?” and “How can we effectively reduce the risks to that information?”

Companies understand that good data makes employees’ jobs easier. And employees are really crafty at procuring data to help with their day jobs, even if it’s against the rules. If salespeople can get the entire customer database to help meet their quotas, or quality assurance personnel think they need production data to test web applications, they usually find ways to get it. The same goes for decentralized organizations where regional offices need to be self-sufficient, or companies need to share data with partners. The mental shift we see in enterprise environments is to stop fight these internal user requirements, but find a way to satisfy this demand safely. In some cases this means automated production of test data on a regular schedule, or self-service interfaces to produce masked content on demand. These platforms are effectively implementing a data security strategy for fast and efficient production of test data.

Compliance: Compliance is the second major reason cited by customers for why they buy masking products. Unlike most of today’s emerging security technologies, it’s not just the Payment Card Industry’s Data Security Standard (PCI-DSS) driving sales – many different regulatory controls, across various industry verticals, are driving broad interest in masking. Early customers came specifically from finance, but adoption is well distributed across different segments, including particularly retail, telecomm, health care, energy, education, and government. The diversity of customer requirements makes it difficult to pinpoint any one regulatory concern that stands out from the rest. During discussions we hear about all the usual suspects – including PCI, NERC, GLBA, FERPA, HIPAA, and in some cases multiple requirements at the same time. These days we hear about masking being deployed as a more generic control – customers cite protection of Personally Identifiable Information (PII), health records, and general customer records, among other concerns; but we no longer see every customer focused on one specific regulation or requirement. Now masking is perceived as addressing a general need to avoid unwanted data access, or to reduce exposure as part of an overall compliance posture.

For compliance masking is used to protect data with minimal modification to systems or processes which use the (now masked) data. Masking provides consistent coverage across files and databases with very little adjustment. Many customers layered masking and encryption in combination; using encryption to secure data at rest and masking to secure data in use. Customers find masking better at maintaining relationships within databases; they also appreciate that it can be applied dynamically and causes fewer application side effects. In some cases encryption is deployed as part of the infrastructure, while others employ encryption as part of the data masking process – particularly to satisfy regulations that prescribe encryption. But the key difference is that masking offers full control over the data lifecycle from discovery to archival, whereas encryption is used in a more focused manner, often at multiple different points, to address specific risks. Masking platform manage the compliance controls, including which columns of data are to be protected, how they are protected, and where the data resides.

Production Database Protection: The first two use cases drive the vast majority of market demand for masking. While replacement of sensitive data – specifically through ETL style deployments – is by far the dominant model, it is not the only way to protect data in a database. At some firms protection of the production database is the primary goal for masking, with test data secondary. Masking can do both, which makes it attractive in these scenarios. Production data generally cannot be fully removed, so this model redirects requests to masked data where possible. This use case centers around protecting information with finer control over user access and dynamic determination whether or not to provide access – something roles and credentials are not designed to support. Dynamic masking effectively redirects suspect queries to a masked view of the real data, along with reverse proxy servers, in a handful of cases. These customers appreciate the dual benefits of dynamically detecting misuse while also monitoring database usage; they find it useful to have a log of which view of information has been presented to which users, and when.

It is worth mentioning a few use cases I expected to drive customer adoption, but which failed to drive significant interest – at least among the customers we contacted during our research. One model that has gotten some attention over the last couple of years is masking data for data warehousing and analytics. When I asked about this, several companies complained that the common security strategy of “walling off” data warehouses through network segmentation, firewalls, and access control systems, was seriously flawed at best; these users were looking for better ways to securing the data instead. Masking solutions led these evaluations over encryption and tokenization solutions, primarily because they can scale to very large data volumes and better at securing data while maintaining complex relationships within the database. But this is still unusual. And it is simply too early to lump data warehouse protection demand with the need to secure “Big Data”, or to claim this segment is driven by NoSQL platforms like Hadoop. Customer demand is just not there – at least so far. Vendors are just now releasing tailored solutions to the market, leading customer demand in anticipation that NoSQL security issues will match or be more severe than those on a typical large analytical system.

Concerns about running databases in “the cloud”, or pushing data into multi-tenant environments, are not driven by “Big Data” or traditional warehouse applications at all. We spoke with a handful of enterprise masking customers who adopted masking for cloud databases; but they only moved test systems, mostly in Iaas (Infrastructure as a Service) deployments. In other cases customers decided to encrypt data prior to moving it into the cloud, or to leverage their SaaS vendor’s identity management and encryption capabilities. We were surprised by one use case: masking to secure streams of data – specifically digitized call records for XML streams, which came up a number of times, but not enough to constitute a major trend. Finally, we expected more customers to mention HIPAA as a requirement, and to mask in order to secure complex data sets. In actuality only one firm citied HIPAA as driving masking.

Next we will wrap up with a buying guide on what to look for, how to evaluate solutions, and pitfalls to watch out for.