Medical AI and HIPAA Privacy

Medical AI development and deployment can implicate the HIPAA Privacy Rule and HIPAA Security Rule when individually identifiable health information is used, disclosed, or maintained by a HIPAA Covered Entity or Business Associate.

Context For Medical AI Privacy Discussions

A recurring theme in medical AI compliance is that privacy obligations depend on whether the activity involves protected health information and whether the organization falls within HIPAA’s regulated scope. Legal and compliance reviews often involve counsel with experience in intellectual property, privacy, artificial intelligence, machine learning, and digital health, including counsel who advise on commercial contracts, open source issues, venture financings, and mergers and acquisitions. Practical risk assessment also benefits from domain familiarity with clinical settings, including mental health care and Veterans Affairs workflows, because intended uses and data flows can differ from general consumer technology patterns.

HIPAA Scope And Related Federal Frameworks

HIPAA is a sector-specific United States privacy and security framework rather than a general privacy law, and the statute’s name reflects portability rather than privacy. United States privacy law is commonly described as siloed, with separate regimes for financial privacy, education privacy, and health privacy.

The Health Information Technology for Economic and Clinical Health Act expanded HIPAA in 2009 and promoted adoption of health information technology and electronic health records through incentives and implementation support. The Health Information Technology for Economic and Clinical Health Act strengthened HIPAA Privacy Rule and HIPAA Security Rule requirements, added breach notification obligations, and increased penalties for noncompliance.

State health privacy and medical record laws can apply to disclosures and record handling, and they can impose stricter requirements than HIPAA in some settings. This article addresses HIPAA as the primary framework, with limited references to other regimes that commonly arise in medical AI privacy analysis.

Regulators And Enforcement Authorities

The United States Department of Health and Human Services administers HIPAA and the Health Information Technology for Economic and Clinical Health Act through multiple components. The Office for Civil Rights enforces HIPAA and applies civil penalties and settlement processes. The Office of the National Coordinator for Health Information Technology has responsibilities tied to the Health Information Technology for Economic and Clinical Health Act. State attorneys general can also bring actions for unauthorized use or disclosure of protected health information of residents within their states under authorities provided by the Health Information Technology for Economic and Clinical Health Act. Enforcement capacity and internal agency structures can change over time, including organizational adjustments intended to address enforcement caseload.

Protected Health Information Definition In Medical AI Use Cases

HIPAA obligations generally attach when protected health information is involved. Protected health information is individually identifiable health information that is transmitted or maintained in electronic media or any other form or medium.

Individually identifiable health information includes information that relates to an individual’s past, present, or future physical or mental health condition, the provision of health care to the individual, or the past, present, or future payment for the provision of health care to the individual. The information must identify the individual or create a reasonable basis to believe it can be used to identify the individual.

Medical AI data pipelines often involve combinations of clinical data, imaging, device data, payment and claims data, and derived labels that can become identifying when linked to other information. A protected health information determination should address both direct identifiers and indirect identifiability in the specific context of use and recipients.

Covered Entities And Business Associates

HIPAA applies to HIPAA Covered Entities and Business Associates. HIPAA Covered Entities include health care providers that conduct certain transactions electronically, health plans, and health care clearinghouses. Business Associates are persons or entities that create, receive, maintain, or transmit protected health information on behalf of a HIPAA Covered Entity in connection with functions or activities regulated by HIPAA.

Medical AI developers that use protected health information to provide services to a HIPAA Covered Entity are commonly positioned as Business Associates unless they independently qualify as a HIPAA Covered Entity. Business Associate Agreements are the contract mechanism that governs permitted uses and disclosures, required safeguards, subcontractor controls, and breach responsibilities.

All workforce members must receive HIPAA training. Annual HIPAA training is industry best practice. Business Associates must also ensure all staff receive security awareness training, and staff with access to protected health information must receive HIPAA training. Additional Business Associate training responsibilities include training on Business Associate Agreement obligations, permitted use and disclosure limits, breach reporting timelines, and administrative, physical, and technical safeguard practices that support the HIPAA Security Rule.

HIPAA Privacy Rule Baseline Permission Model

The HIPAA Privacy Rule generally prohibits use or disclosure of protected health information by a HIPAA Covered Entity or Business Associate unless a HIPAA Privacy Rule permission applies or a valid authorization is obtained from the individual.

A valid authorization is a detailed written document with specific content requirements. Voluntary consent alone is not sufficient to permit use or disclosure where authorization is required. Authorization collection introduces operational burden, including document management, scope control, revocation handling, and downstream flow-down to recipients.

Permitted Uses And Disclosures Relevant To Medical AI

Medical AI activities commonly rely on one of two compliance approaches when protected health information is needed.

Deidentification Safe Harbor Approach

Deidentification can remove data from HIPAA scope when the deidentification requirements are satisfied. When deidentified, the information is no longer protected health information, and HIPAA restrictions no longer apply to that information. This can enable model training, sharing, licensing, or contribution to data pools without HIPAA use and disclosure constraints.

Two HIPAA deidentification pathways are used.

The expert determination method requires a written determination by a qualified statistical expert that the risk of identification is very small. This method turns on details that require careful evaluation, including expert qualifications, methodology, and the documented rationale for the risk assessment.

The identifier removal method requires removal of 18 categories of identifiers, such as names, phone numbers, email addresses, medical record numbers, device identifiers, IP addresses, and biometric identifiers. The HIPAA Covered Entity or Business Associate must also lack actual knowledge that the remaining information can identify the individual.

Operational constraints can limit deidentification viability. Unstructured data can make verification of full identifier removal difficult. Excessive removal can degrade model utility by stripping signal needed for training and validation. Business Associate Agreements frequently need explicit permissions and process terms to allow a Business Associate to perform deidentification when protected health information originates from a HIPAA Covered Entity.

Deidentified data is increasingly available through curated initiatives that aggregate deidentified medical datasets for broader research and development use, including international sources and multi-institution pools intended to support medical AI development. Use of third-party deidentified datasets still requires diligence on provenance, method of deidentification, contractual restrictions, and controls against reidentification attempts.

Reidentification Studies And Actual Knowledge

Technical literature describes reidentification and inference risks for deidentified or partially deidentified datasets. HIPAA administration has addressed the effect of awareness of reidentification methods on the “actual knowledge” element in the identifier removal method. Awareness of the existence of reidentification techniques does not, by itself, establish actual knowledge that the deidentified information can identify an individual in the recipient’s hands. Risk increases when the disclosing party knows the recipient has the capability to reidentify and has used that capability in the past. That fact pattern can undermine reliance on deidentification safe harbor.

Treatment, Payment, Health Care Operations, And Research Approach

When protected health information cannot be removed from scope through deidentification, organizations often consider whether the intended activity fits within the HIPAA Privacy Rule permissions for treatment, payment, and health care operations, and whether research permissions are available.

Treatment

Treatment permissions generally require that the activity be carried out by a health care provider, or by a health care provider working with another party, and that the activity relates to a specific individual’s care. Population-level analysis that may prompt providers to offer treatment is not treated as treatment for HIPAA purposes, even when it informs care decisions. This individual-patient focus places many model development and broad training activities outside treatment permissions, although certain personalized medicine workflows may align more closely with treatment-based processing.

Business Associate Agreements are commonly needed when an AI vendor processes protected health information on behalf of the provider in connection with treatment-related services.

Payment

Payment permissions commonly cover activities such as billing, claims management, and reimbursement processes. A hospital or health plan can use a Business Associate to run analytics that identify coding or billing errors, including mismatches or inaccuracies associated with CPT coding, when the Business Associate Agreement permits the processing and the activity fits within payment functions. Under this structure, protected health information processing by the AI system can occur without patient authorization.

Health Care Operations

Health care operations is broad and includes quality assessment and improvement, population-based activities aimed at improving health outcomes or reducing health care costs, and general administrative and business management functions. This breadth can make health care operations the most frequently evaluated permission for medical AI tools used by HIPAA Covered Entities.

A recurring constraint is that the Business Associate’s activities must be tied, at least in part, to the benefit of the HIPAA Covered Entity whose protected health information is being used. Data aggregation under a Business Associate Agreement can allow a Business Associate to combine protected health information from multiple HIPAA Covered Entities for analysis that relates to the health care operations of the participating HIPAA Covered Entities. Whether this concept extends to training a generalized model across multiple HIPAA Covered Entities under health care operations is a common area of uncertainty. A conservative approach treats generalized model training for the vendor’s independent purposes as outside the intended scope of the data aggregation permission, which supports a model segregation approach where models trained on a specific HIPAA Covered Entity’s protected health information are limited in use to that HIPAA Covered Entity.

Health care operations can also support the creation of deidentified data. A notable feature of this permission is that the use of protected health information to create deidentified information can be treated as part of health care operations when permitted by the Business Associate Agreement and performed on behalf of the HIPAA Covered Entity. Once deidentified, the resulting information can be used by the Business Associate for its own purposes, including commercial development and broader model training, because the information is no longer protected health information. This structure places significant weight on Business Associate Agreement terms, the documented purpose that ties deidentification to HIPAA Covered Entity benefit, and the controls that support compliance during the deidentification process.

Research

Research is defined as a systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge. Research can overlap with health care operations in areas such as quality assessment and improvement. The distinction commonly turns on whether the primary purpose is to produce generalizable knowledge.

Research-based pathways can be better aligned with training models intended for use beyond the HIPAA Covered Entity that supplied the protected health information. That expanded portability can be offset by additional compliance obligations. Depending on the structure, research access may require individual authorizations or an institutional review board waiver of authorization. If those pathways are not used, access may be provided through a data use agreement for a limited data set.

A limited data set excludes many direct identifiers but can retain certain elements such as city, state, gender, dates related to the individual, and other identifiers not listed among the 18 enumerated identifiers. The data use agreement governs permitted uses, disclosures, safeguards, and downstream restrictions.

Activities can transition between health care operations and research. Documentation of the transition and adherence to the applicable requirements for the new category are part of operational compliance control.

Computer Processing As “Use” And Model Training Arguments

Regulatory commentary has included statements indicating that certain computer processing of data does not constitute a use, based on historical examples involving automated query processing. That commentary is guidance rather than binding text, and it predates modern machine learning training practices. Model training can be characterized as more transformative than simple query processing because it can involve learning representations from the data and, in some circumstances, memorization behaviors that become visible through overfitting, model inversion, or related attacks.

Regulatory materials addressing cybersecurity incidents have also treated automated processing as a use in other contexts, including computer processes that encrypt protected health information during ransomware attacks. That position supports treating model training on protected health information as a use for HIPAA Privacy Rule analysis. For operational risk management, permitted use pathways and deidentification remain the more defensible planning anchors.

HIPAA Minimum Necessary Rule And Data Volume Tension

The HIPAA Minimum Necessary Rule requires HIPAA Covered Entities and Business Associates to make reasonable efforts to use, disclose, and request only the minimum necessary protected health information to accomplish the intended purpose. Medical AI development commonly benefits from larger datasets, which creates an inherent tension between large-scale model training and minimum necessary constraints.

Technical and procedural measures can narrow the gap. Direct identifiers such as names and Social Security numbers can be replaced with unique internal identifiers when they are not required for the intended purpose. Tabular datasets can be reviewed for columns that do not contribute to the intended modeling task and removed prior to transfer or processing. Documented necessity rationales and field-level data mapping support audits and internal reviews.

Privacy-Risk Mitigation Techniques Beyond Traditional Access Controls

Distributed learning approaches, including federated learning, are used to reduce centralized access to protected health information by keeping data within institutional boundaries while sharing model updates or aggregated signals. Synthetic data is also used to enable training on data that resembles protected health information without constituting protected health information, including synthetic electronic health record generation and image synthesis workflows used in diagnostic model development. Synthetic data risk management turns on whether the synthetic output contains or reveals protected health information from source records, including leakage of identifiable traces. Evaluation plans should address memorization and reconstruction risks as part of privacy assurance.

Breaches, Notification, And Penalties

An impermissible use or disclosure of protected health information that compromises its privacy or security can constitute a breach under HIPAA frameworks, triggering notification obligations and associated response requirements.

HIPAA enforcement includes civil monetary penalties and settlement agreements, and criminal penalties can apply to knowing violations. Penalty tiers vary based on the level of culpability, and the most severe tier can involve substantial fines and imprisonment for obtaining protected health information for personal gain or malicious intent. Penalty amounts are subject to inflation adjustments, and enforcement actions have included multi-million-dollar settlements in recent years. Enforcement outcomes and annual penalty tables change over time, and compliance programs typically track published updates as part of governance.

Interaction With Other Privacy And Health Data Regimes

HIPAA is not the only privacy framework that can affect medical AI activities. California Consumer Privacy Act and California Privacy Rights Act provisions include HIPAA-related exceptions for protected health information regulated by HIPAA, which can reduce dual compliance for protected health information activities. Separate provisions can still affect deidentified data transactions, including requirements that transfers characterized as sales include contractual prohibitions on reidentification. “Sale” can be defined broadly and can include exchanges without direct monetary compensation. Certain provisions can also extend to entities that do not meet other California threshold criteria.

Federal Trade Commission enforcement has increased attention on the Health Breach Notification Rule for health data outside HIPAA scope, including actions involving consumer health applications and related services. Medical AI developers operating outside HIPAA Covered Entity and Business Associate structures still face privacy risk through Federal Trade Commission authority, sector-specific confidentiality rules such as 42 CFR Part 2 for federally assisted substance use disorder treatment programs, and a growing set of state health privacy and medical record laws.

Non-Legal Risk Factors And Regulatory Attention

Some activities can comply with HIPAA while still creating reputational risk if stakeholders view the data practice as opaque or inconsistent with patient expectations. Ethical concerns can increase scrutiny, and scrutiny can lead to direct costs, adverse publicity, and management diversion.

Regulatory attention can also expand based on broader policy themes. Public reporting and regulator inquiries have addressed concerns about bias in medical AI tools, including state-level inquiries to hospitals about the use of medical AI tools in relation to bias risk. Competitor misconduct can contribute to heightened oversight across a sector, which increases the value of internal controls that reduce discretionary risk exposure.

Operational Compliance Controls For Medical AI Programs

Medical AI privacy programs typically center on documented determinations of protected health information status, HIPAA Covered Entity and Business Associate status, and the permission pathway used for each data flow. Business Associate Agreements, data use agreements, and authorization packages should align with the intended use, model lifecycle, and downstream sharing model. Workforce training should cover HIPAA Privacy Rule permissions, HIPAA Security Rule safeguard expectations, breach response procedures, and the HIPAA Minimum Necessary Rule. Annual HIPAA training is industry best practice, and Business Associate security awareness training supports ongoing risk management in evolving threat environments.