Exact Data Matching
Understanding Exact Data Matching
Rather than relying on pattern matching to identify sensitive data, this method compares scanned data against actual data that is known to be sensitive. It looks for copies of the actual sensitive data instead of a pattern that describes it. This method is geared towards structured data with fields and elements such as databases, and when used properly, it nearly eliminates false positives, freeing up your team to focus on important (real) security events.
Exact Data Match (EDM):
-
Optimized for Large Datasets: Skyhigh’s Enhanced EDM offers unparalleled performance, being able to match against billions of cells in under a second.
-
Data Precision: EDM ensures accurate matching of specific data patterns, reducing the risk of false positives.
-
Supports Traditional DLP Logic: Policies can use EDM as well as traditional logic such as proximity and keyword validation, where a data field or keyword (or another data field) needs to appear within a certain distance of each other.
Protects Sensitive Data with One-Way Hashes
A prudent observation of such an approach might be that providing sensitive data to a third party such as Skyhigh is a risk in itself. To prevent this risk exposure, Skyhigh sends only the hashed values of your sensitive data to our cloud, which means that Skyhigh is not storing your data in any original or reversible form.
Sample Original Data
HIC | First Name | Middle Name | Last Name | Birthdate | Gender | Diagnosis Codes | Primary Physician | Company | Address | City | State | Zip Code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HIC10011 | Rachel | Marie | Turner | 1983-09-25 | F | L30.9 | Dr. Christopher Lee | Tech Innovators | 789 Elm St | TechCity | CA | 90123 | rachel.turner@emai |
HIC10012 | Benjamin | Thomas | White | 1978-03-12 | M | I10 | Dr. Melissa Harris | Health Solutions | 567 Oak St | Healthtown | TX | 34567 | benjamin.white@ema |
HIC10013 | Grace | Elizabeth | Taylor | 1990-06-30 | F | F41.1 | Dr. Andrew Smith | Wellness Corp | 123 Pine St | Wellsville | NY | 56789 | grace.taylor@email |
HIC10014 | William | James | Adams | 1986-11-18 | M | G82.9 | Dr. Sophia Brown | Med Solutions | 456 Birch St | Medville | FL | 23456 | william.adams@emai |
HIC10015 | Emma | Rose | Johnson | 1997-02-14 | F | N23.9 | Dr. Daniel Miller | HealthTech | 101 Cedar St | Techville | CA | 67890 | emma.johnson@emai |
Hashed Version of the Same Data
HIC | First Name | Middle Name | Last Name | Birthdate | Gender | Diagnosis Codes | Primary Physician | Company | Address | City | State | Zip Code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
b6af09d8f5a90e72 | a2e5c4d9c163fd0e | a94a8fe5ccb19ba6 | 78abfb18d8281455 | 7c979a7e785c62bf | a3f390d88e4c41f2 | 1a3968efea0b59e2 | 44cc9e24e5eb3dd8 | 579cbaca35f204a4 | 02e4a515a77a61cd | 0c6457b1f4978f97 | d907a5b29e78c55e | 84a516841ba77a5b | 9e69d3654a2e772c |
d7834f61b132b334 | 5b1a7c762408b330 | 4a0679561e37b2d8 | 5c4e150b9b0e3c41 | e5502a7c68d847d0 | 4a0679561e37b2d8 | 7dd8d9d5e42f2e7b | 0467079ec39ce477 | 189c8726ad7a0a1f | f249c0e6b9750200 | 8fc7c72b649350bf | 6cf12ebd85f90e9e | d596f4bfe8a6d0bc | 10a86c3a4c776f0e |
56e8d8a5d2b1e5b6 | 0fcdfd31b3992a9f | a764ef4a8e6cb0e2 | 71fdda2e8b3e47c5 | b4dd4ee006d97b0d | c59d4083ccceccee | 4df1c97c1fc2b839 | 6797fbd7c3e25be9 | f3ff8e793a092bb6 | a4655d55ea6f78de | 9d8c99252e86fe8c | 5715a29c338d3d0e | 9b8f0f12548cdd8e | 7d04f11b993a93d0 |
b6af09d8f5a90e72 | a2e5c4d9c163fd0e | a94a8fe5ccb19ba6 | 78abfb18d8281455 | 7c979a7e785c62bf | a3f390d88e4c41f2 | 1a3968efea0b59e2 | 44cc9e24e5eb3dd8 | 579cbaca35f204a4 | 02e4a515a77a61cd | 0c6457b1f4978f97 | d907a5b29e78c55e | 84a516841ba77a5b | 9e69d3654a2e772c |
Understanding Intelligent Data Matching
Skyhigh also supports Intelligent Data Matching (IDM), which works in a way similar to EDM except that it is geared towards unstructured data such as documents. Rather than requiring an exact match, IDM allows you to set a percentage similarity. For example, you can configure it to match a scanned file if the content is 90% the same as a file that is known to be sensitive. This method works best where the organization relies heavily on templated documents such as health, finance, or tax documents.
Intelligent Data Match (IDM):
IDM, or Intelligent Data Match, utilizes advanced algorithms and contextual analysis to identify sensitive data. Its advantages in the context of DLP include:
-
Contextual Analysis: IDM considers data context, enabling it to identify sensitive information even when it’s not an exact match.
-
Adaptive Protection: It can adapt to evolving data threats and dynamically adjust data protection measures.
Due to time constraints, we will not be configuring IDM in today’s lab. However, the process is similar to EDM, and you should feel free to try it out after the completion of the guided portion of today’s lab.
EDM and IDM are Complementary Technologies
By integrating Exact Data Match (EDM) and Intelligent Data Match (IDM) into a holistic DLP strategy, you can enhance data protection, reduce false positives, and improve your organization’s overall security posture.
Let’s Get Started with EDM