Study: HIPAA Data De-identification Improvements Are Needed

by | Apr 29, 2015

According to HIPAA Rules, healthcare providers and other covered entities (CEs) are allowed to use the Protected Health Information (PHI) of patients – and share this data with others – provided that this data has been de-identified. It must not be possible for PHI data to be linked to any person.

CEs are alloewed to share the data if it can be demonstrated that the danger of that data being associated with a particular patient is minimal and have two options for de-identifying healthcare data prior to sharing that information with a Business Associate:

They can de-identify data using a model such as k-anonymity, or they can set a rule-based policy – the Safe Harbor model – that alters data values; for example, changing dates of birth to the following or preceding year, or taking out days and dates to just provide a patient’s age. However, while the latter method is normally used, it is far from ideal.

recent study published in the Journal of the American Medical Informatics Association (JAMIA) said that this procedure does not tailor protections to the capabilities of the recipient. The study also revealed that “Rule-based policies can be mapped to a utility (U) and re-identification risk (R) space, which can be searched for a collection, or frontier, of policies that systematically tradeoff between these goals.”

Under the HIPAA Safe Harbor model, there are 18 different rules that are in place to de-identify data and remove explicit identifiers such as patient names. The removal of quasi-identifiers – such as dates of birth and appointment or treatment dates – is also covered in this model. In these cases, dates are substituted with years and ages are changed or are grouped (18-24, 25-30, 90+). The problem, as identified out by the researchers, is that these rules are inflexible and are not tailored to the intended recipient, which is far from perfect.

To address this issue, the researchers have suggested an alternative model for protecting PHI. In the paper, it is outlined that the Sublattice Heuristic Search (SHS) algorithm could actually be perfect for the healthcare industry to adopt and use in data de-identification policies.

Researchers showed that an efficient and effective mechanism can be used to find alternatives to rule-based de-identification policies for patient-level datasets. This was achieved by formulating an “algorithm designed to search a collection of de-identification policies that compose a frontier that optimally balances risk (R) and utility (U).” The researchers ruled that “this approach allows for guidance, interpretation, and justification of rule-based policies, as opposed to relying on a predefined standard in terms of the re-identification risk and data utility or formal models.”

The research paper said: “Formally, a frontier is a set of policies that are not strictly dominated by other policies.” |It went on: “Intuitively, a policy pA strictly dominates a policy pB when both risk and utility loss values of pA are no greater than the corresponding values of pB and at least one value is strictly less that of pB.”

Frontier initialization and improvement strategies were reviewed by the researchers who compared the resulting frontiers after searching the same amount of policies.

The researchers, according to the paper, “used the area under the frontier in the R-U space, denoted as AU, as the criteria of the frontier given the orientation of risk and utility loss. We reported the results after every 1000 policies while searching the first 5000 policies.”

R-U tradeoffs were then compared relating to the frontier identified by the best SHS configuration; using a popular k-anonymity method along with to the HIPAA Safe Harbor method. The researchers found that k-anonymization strategies and fixed rule-based policies were inferior to SHS.

The researchers came to the conclusion that the SHS method “has the potential to be a method that overcomes the limitations of a single fixed rule-based policy while being interpretable to health data managers.”

The study also found that “R-U frontiers of de-identification policies can be discovered efficiently, allowing healthcare organizations to tailor protections to anticipated needs and trustworthiness of recipients.”

Raise the level of HIPAA Awareness in your organization with Learner-Friendly, Comprehensive and Affordable HIPAA Training.


Please enable JavaScript in your browser to complete this form.

Patrick Kennedy

Patrick Kennedy is a highly accomplished journalist and editor with nearly two decades of experience in the field. With expertise in writing and editing content, Patrick has made significant contributions to various publications and organizations. Over the course of his career, Patrick has successfully managed teams of writers, overseeing the production of high-quality content and ensuring its adherence to professional standards. His exceptional leadership skills, combined with his deep understanding of journalistic principles, have allowed him to create cohesive and engaging narratives that resonate with readers. A notable area of specialization for Patrick lies in compliance, particularly in relation to HIPAA (Health Insurance Portability and Accountability Act). He has authored numerous articles delving into the complexities of compliance and its implications for various industries. Patrick's comprehensive understanding of HIPAA regulations has positioned him as a go-to expert, sought after for his insights and expertise in this field. Patrick's bachelors degree is from the University of Limerick and his master's degree in journalism is from Dublin City University. You can contact Patrick through his LinkedIn profile:

Raise the level of HIPAA Awareness in your organization with Learner-Friendly, Comprehensive and Affordable HIPAA Training.

Comprehensive HIPAA Training

Used in 1000+ Healthcare Organizations and 100+ Universities

    Full Course - Immediate Access

    Privacy Policy