Training of the artificial intelligence (AI) models requires massive amounts of data, especially when this models are used in healthcare industry. The final result of these models should be very precise as it directly influences the health of patients.
First and foremost, health data for training AI models should be anonymized to protect patients’ confidentiality. Anonymization makes sharing of health data possible for such secondary purposes like analysis, research, development, training, and/or quality control of AI algorithms. So how should data anonymization be performed so it’s not compromising patients’ privacy?
Let’s start with the definition of ‘data anonymization’. Data anonymization is the process of removing personally identifiable information from data sets (e.g., imaging like CTs, MRIs, X-Rays or videos like OR or colonoscopy videos), so that the people whom the data describes or who are in the images/videos remain anonymous.
People are identifiable if imaging or video data includes any references to an identifier such as a name, an identification number, personnel number of a person, account data, customer number or any other personal data which directly or indirectly can help identify the person.
Hospitals and clinics must share only anonymized data with third parties like research organizations or healthcare software development companies. Any sensitive metadata like the patient’s name, social security number, the hospital’s name, and address should be erased. Direct identifiers must be removed or rewritten with random values.
Data anonymization, data storage and data transfer are regulated by GDPR in EU and HIPAA in the US. A good example of this approach is the Safe Harbor standard in the HIPAA Privacy Rule. It specifies 18 data elements that need to be removed or encrypted. If this is done properly, the data is considered anonymized with accordance to HIPAA.
This list includes:
To meet GDPR and/or HIPAA compliance not all fields, associated with imaging or video data should be removed. Often medical research is focused on some specific gender, pathology, age group or geography. This means that some information in the metadata description might be left as is but only if this data is not identifying people in them in any way.
Anonymized data is no longer considered personal health data as people in the images or videos can’t be identified. Thus if the data is anonymized then no patient’s consent is required. On the other hand, if any details might lead to uncovering the patient’s identity, the patient consent is obligatory.
The Digital Hospitals of the Future: Revolutionizing Healthcare
The healthcare industry is undergoing a transformative revolution, driven by rapid advancements in technology. Digital hospitals of the future are poised to redefine...
Bias in AI Healthcare and How to Prevent It
Bias in AI Healthcare and How to Prevent It Artificial Intelligence has rapidly found its way into the healthcare sector, promising improved diagnostics, treatment...
The Next Five Years: Unveiling the Future of Healthcare AI
The Next Five Years: Unveiling the Future of Healthcare AI Introduction In recent years, the integration of artificial intelligence (AI) in healthcare has gained...