Table of Contents

    Table of Contents

    The 7 Most Effective Data Masking Techniques

    Amitai Richman

    Amitai Richman

    Product Marketing Director

    For many companies, data masking is the best way to ensure data security and compliance, but which techniques are best for your business?


    Table of Contents


    What are Data Masking Techniques?
    What Data Should be Masked?
    The 7 Most Effective Data Masking Techniques
    The 5 Top Data Masking Challenges
    A Data Product Platform Does it All

    What are Data Masking Techniques?

    Data masking is a process of data obfuscation that involves creating a version of data that is structurally similar to original data, but masks or hides sensitive information. Data masking techniques refer to the different methods of obscuring sensitive data, such as pseudonymization, anonymization, and scrambling, among others.

    The objective of every data masking technique is to protect sensitive data while providing a functional substitute to serve various purposes, such as user training, software testing, or sales demos.

    In effect, masked data is worthless in the event of a breach, while enabling data consistency and usability across multiple databases and analytics platforms. The process is irreversible to ensure full protection.

    Keep reading to learn more about data masking techniques, top challenges, and how a data product approach handles both.

    What Data Should be Masked?

    Different industries and sectors are compelled to comply with different data privacy regulations. Therefore, every company has different priorities when evaluating data masking solutions, depending on the type of sensitive data they need to protect.

    • Personally Identifiable Information (PII)
      Just as it sounds, PII is data like full names, drivers license numbers, social security numbers, and passport numbers, that can be used to identify an individual.

    • Protected Health Information (PHI)
      PHI refers to medical and insurance-related data, collected by healthcare service providers, including demographic information, test and laboratory results, medical conditions, and prior claims.

    • Payment Card Information (PCI)
      PCI, as it relates to the Payment Card Industry Data Security Standard (PCI DSS), refers to credit and debit cardholder data. According to the standard, companies that process or accept credit and debit card payments must secure this data.

    Data masking is an effective solution for securing sensitive data found in both structured and unstructured data stores, such as images, PDF contracts and agreements, drivers licenses, XML, and more. For example, if you store medical files as PDFs, you can ensure all sensitive information is adequately protected.

    The 7 Most Effective Data Masking Techniques

    1. Data anonymization
      Data anonymization is a method of information sanitization that involves removing or encrypting the sensitive data found in a dataset. It minimizes the risk of a breach when data is in transit, and also maintains the structure of the data to support analytics.

    2. Data pseudonymization
      Pseudonymization swaps sensitive information, such as a name or drivers license number, with a fictional alias or random figures. This is a reversible process, and can also be applied to unstructured data, like a photocopy of a passport.

    3. Encrypted lookup substitution
      Another technique for masking production data is creating a lookup table that provides realistic alternative values to sensitive data. These tables must be encrypted to prevent a breach.

    4. Redaction
      Redaction involves replacing sensitive data with generic values in development and testing environments. This technique is useful when the sensitive data itself isn’t necessary for QA or development, and when test data can differ from the original datasets.

    5. Shuffling
      Instead of substituting data with generic values, shuffling is a technique that randomly inserts other masked data. For example, instead of replacing employee names with fake ones, it scrambles all of the real names in a dataset, across multiple records.

    6. Data aging
      If data includes confidential dates, you can apply policies to each data field to conceal the true date. For example, you can set back the dates by 100 or 1,000 days, randomly, to maximize concealment.

    7. Nulling out
      This data masking technique protects sensitive data by applying a null value to a data column, so unauthorized users won’t be able to see it.

    The 5 Top Data Masking Challenges

    1. Format preservation
      The data masking solution must be able to understand what the data represents, otherwise it will be difficult to preserve the format of the original data. This requirement is especially important for datasets that require a specific format or order, such as dates, ID numbers, or telephone numbers.

    2. Referential integrity
      Referential integrity refers to masking each type of sensitive data with the same algorithm to ensure consistency across databases. Data masking tools and processes must be synchronized across the organization for each data type, to keep it functional for analytics and other use cases.

    3. Semantic integrity
      Semantic integrity refers to masking the data in a way that preserves its meaning. For example, if you were to mask a range of ages, the new values should fall within the original range. For example, a 69 year old man might have his age masked as somewhere between 66 and 74.

    4. Gender preservation
      When obfuscating names that need to remain private, the data masking system must have the ability to decipher the correct gender associated with the name. If names are changed randomly, the gender distribution in a table will be altered.

    5. Data uniqueness
      If the sensitive data in a dataset is unique, as in the case of a Social Security Number, the data masking solution should apply unique values for each data element. The solution should also support referential integrity and collision avoidance functionality.

    A Business Entity Approach Does it All

    The entity-based data masking technology enables data masking best practices, while resolving common data masking challenges. It delivers all of the data related to a specific business entity – such as customers, payments, orders, and devices – to authorized data consumers.

    Unlike many other data protection solutions that centralize sensitive information, entity-based data masking persists and manages every instance of a business entity in its own individually encrypted Micro-Database™.

    With business entities, it’s easy to protect data – at rest, in use, and in transit – for production, testing, and analytics environments. It performs both dynamic and static structured data masking and unstructured data masking, while maintaining referential integrity. Images, PDFs, text files, and other data fields that might contain sensitive data are protected, while analytical and operational workloads can continue running without interference.

    For companies that require a range of data masking techniques, and want to avoid the vulnerabilities associated with conventional solutions, a business entity approach is the way to go.

    Achieve better business outcomeswith the K2view Data Product Platform

    Solution Overview

    Discover the #1
    data masking tool

    Built for enterprise complexity.

    Solution Overview