Skip to main content

Digital Challenges with Patient Data Anonymization

The Challenges of Data Privacy

As machine learning is increasingly used to identify and alert clinicians to the signs of patient deterioration in hospitals—including the integration of data into clinical decision support systems—key considerations around the evaluation of large amounts of complex healthcare data have emerged.1 Predictive algorithms offer potential advantages in their ability to synthesize and analyze data from diverse sources, such as laboratory results, imaging data, and doctors’ notes – and then incorporate them in ways that may help alert clinicians to signs of patient deterioration.1  Despite this potential, the application of machine learning in healthcare also presents unique challenges involving patient privacy and anonymization, data security, and the risks of leaks and data breaches, as well as ethical considerations inherent in the sharing of highly sensitive electronic medical record (EMR) data.2.3

Providing Relevant Data While Ensuring Security and Privacy

While the use of EMR data has significantly enhanced the quality and efficiency of medical research and healthcare delivery, concerns about patient privacy due to the nature of the information, as well as the widespread practice of health information sharing, has led to government safeguards designed to protect EMR data.The Health Insurance Portability and Accountability Act (HIPAA), which established a set of data privacy rules for protected health information (PHI) followed by the 2009 Health Information Technology for Economic and Clinical Health Act (HITECH), provide a “regulatory roadmap” for data security.4 However, given the huge amounts of data that stream in and out of hospitals—as well as the proliferation of platforms and apps used for data sharing—healthcare institutions must also provide security measures that go beyond standard regulatory protections in order to protect patient privacy and avoid data leaks or beaches. Some of these measures include:

• Encryption algorithms and firewalls to ensure that that information is safely transmitted over the internet 3,4

• Collaboration among clinical, technical, and C-Suite executives, so that only data that is relevant to particular needs is extracted

• An audit trail that can track all attempts to access patient data 4

Virtual Solutions

Another solution has been the implementation of cloud-based systems that have the ability to effectively and safely manage the exchange of relevant, real-time data to clinicians across the hospital enterprise, while anonymizing patient data. Such systems are increasingly being deployed to streamline data gathering, as well as boost patient privacy and data security. In the case of patient deterioration, virtual data “hubs” can also help clinicians respond in a more timely fashion to the signs of sepsis. This has the cascading potential to help implement treatment plans more precisely, streamline and expedite provider communication, and maintain the privacy of patients through “bank-level” security, sophisticated encryption methods, and real-time surveillance.4

Why Data Ethics Matter

In addition to privacy concerns, the advent of EMR, artificial intelligence, and machine leaning calls for the development of standards and codes of conduct that balance technical, clinical, and commercial motivations with appropriate moral oversight.5 Key to these codes of conduct is an emphasis on transparency, protection of patients, and continual monitoring to ensure ethical and unbiased patient care.5 EMR and predictive algorithms may eventually raise questions about ethical considerations, such as racial, ethnic, and genetic biases that could inadvertently be “built” into health care algorithms based on patient EMR data, unless rigorous safeguards are maintained.1

While healthcare systems and clinicians will never be able to control the entire healthcare ecosystem—it is imperative that clinicians and healthcare executives understand machine learning and how measures can be put into place to minimize risk or bias in the use of EMR data—and that, overall, the use of data and predictive algorithms encompasses transparency, security, and an urgency to protect patient privacy.


1. Big Data and Machine Learning Algorithms for Health-Care Delivery, The Lancet, May 2019. https://www October 15, 2019.

2. Anonymizing and Sharing Medical Text Records, Information Systems Research: The Institute for Operations Research and the Management Sciences, April 2017. October 15, 2019.

3. Security Techniques for the Electronic Health Records, Journal of Medical Systems, August 2017. October 15, 2019.

4. Crucial data security measures every EMR must have in place, Becker’s Health IT & CIO Report, August, 2017. October 17, 2019.

5. Ethics of Artificial Intelligence in Radiology: Summary of the Joint European and North American Multisociety Statement, RSNA, October 2019. October 17, 2019.