Monetising Personal Health Data in the Aged Care Sector – Part 1 the legal and regulatory pitfalls

Background

This alert has been prepared against a background of a number of reports of medication management service providers in the aged care sector that are, or are proposing to, share patient health data in the form of unit/individual level medication records with third parties. The reports indicate that this type of health information disclosure involves the service providers receiving a payment or other financial benefit for disclosing patient health data to third parties.

This type of health information sharing involves a range of significant risks and the real possibility of regulatory action under Australia’s privacy laws.

The purpose of this alert is to provide a high level account of the regulatory environment and associated risks related to the sharing and/or monetisation of personal data.

Context

The increased use of advanced data analytics and forms of Artificial Intelligence (AI) in the health sector – particularly generative AI that utilises machine and deep learning powered by large language models (LLMs) and Multimodal Foundation Models (MfMs) – rely on the availability of data sets that are used to train the algorithms that underpin AI applications. Essentially, AI applications that use machine and deep learning models consist of two main components – data and algorithms.

These forms of AI depend on the widespread availability of training data.

The demand for high quality data to power health sector AI is growing rapidly and is likely to continue as AI use becomes normalised in health care settings. The highest quality, and most accurate and valuable health data is unit level data, i.e., the health data of individuals.

Sharing health data

The overarching Australian legislation that governs the collection, use, disclosure (sharing), security and other handling of personal information is the Privacy Act 1988 (Cth) which covers the Commonwealth public sector and the private sector on a national basis. Essentially it gives individuals a measure of control over these activities in respect of their personal information.

Personal information is information or opinion about an identified individual, or an individual who is reasonably identifiable. Additional protection is given for personal information that is sensitive information. Sensitive information includes an individua’s health and genetic information as well as some aspects of biometric information.

The general rule under the Privacy Act 1988 is that personal health information cannot be used or disclosed unless the individual has given consent to the disclosure or if the disclosure is directly related to the primary purpose for which the health information was collected.

The requirements to obtain legally valid consent are that it must be informed, voluntary, current and specific, and the individual must have the capacity to consent. In the reports that we have seen about the disclosure of aged care patients’ health information sharing to third parties, it is highly doubtful that these consent requirements have been addressed or undertaken.

The alternate sharing pathway – the requirement that disclosure must be directly related to the purpose for which the health information was collected – is also problematic. Aged care patients’ information is collected for the primary purpose of treatment, not providing training for AI or unrelated third party disclosure.

Are there other sharing options?

The corollary of legal protection for personal and sensitive information is that the protection does not extend to information that is not personal information. The Privacy Act 1988 recognises that personal information that has been de-identified is no longer personal information and is not protected.

The Act states that “personal information is de-identified if the information is no longer about an identified individual or an individual who is reasonably identifiable.”

Theoretically, these concepts are straightforward: if information is about an identified individual it is protected by privacy law but if it has been de-identified it is not. But in practice de-identification is risky, problematic and, even if implemented carefully in highly controlled environments, resource intensive.

The de-identification problem

De-identification refers to the process of removing from, or altering information in, a dataset that directly identifies an individual. For example, a simple form of de-identifying an individual’s health information is to delete the name, date of birth, address and postcode, leaving nothing in the data set but the information about the person’s health, such as diagnosis, treatment and medication. In theory, the health data that remains after stripping out directly identifying information is de-identified information.

However, the risk of re-identification of this – and many other more rigorous de-dentification models – is high, particularly where unit level data is concerned. The reason for this is that whatever form of de-identification that is used there is a high likelihood that other information – often called ‘auxiliary information’ – can be matched with the de-identified information so as to re-identify it.

One of the most widely publicised examples of the re-identification of a (supposedly) de-identified data set was in 2016 when University of Melbourne researchers easily re-identified a longitudinal data set of the medical billing records of almost 3 million Australians. The data was publicly released by the Commonwealth Department of Health on its open data website. It was derived from the Medicare Benefits Scheme (MBS) and the Pharmaceutical Benefits Scheme (PBS) and included all publicly reimbursed medical and pharmaceutical bills for about 10% of the population for a thirty-year period from 1984 to 2014.

The re-identification process that was used was ‘straightforward for anyone with technical skills about the level of an undergraduate computing degree.’ Two methods were used. The first involved decrypting the IDs of suppliers. The encryption employed by the Department – pseudorandom number generation – was easily guessed and reversed. The second involved individual linkage attacks on the data set. ‘Linkage attacks work by identifying a “digital fingerprint” in the data, meaning a combination of features that uniquely identifies a person. If two datasets have related records, one person’s digital fingerprint should be the same in both. This allows linking of a person’s data from the two different datasets – if one dataset has names then the other dataset can be re-identified.’ Linkage attacks rely on comparing information in one data set against the ‘de-identified’ information in another data set, the former being referred to as ‘auxiliary information.’ As one of the world’s leading experts, Cynthia Dwork, has said, “[s]uccinctly put, ‘De-identified’ data isn’t, and the culprit is auxiliary information.’

One approach to de-identification is to remove as much of the information in a data set so as to reduce re-identification risks through linkage attacks. However, as more information is removed the data set becomes degraded and its usefulness and accuracy declines. A cognate approach, the aggregation of information in data sets, can raise similar problems.

Finally, de-identification is not a static exercise. Any approach to de-identification is determined by the context in which information is collected, stored, used, disclosed and secured. This dynamic risk varies over time. A release of de-identified information that takes place now runs the risk of re-identification in the future, particularly as more and more auxiliary information becomes obtainable through sources as diverse as government open data programs and the volume of publicly available personal information generated from sources such as social media. Despite the changing risk environment, regulatory obligations remain constant, leaving organisations that release de-identified data with ongoing legal obligations that extend over the entirety of the lifecycle of the de-identified data. What might now be de-identified in accordance with best practice might, with the passage of time, become readily re-identifiable.

Australian Privacy in a state of flux

The Privacy Act 1988 is currently under review and there is a high likelihood of significant amendments designed to strengthen it to make it more suitable for the rapidly developing technical environment and to more closely align it with international benchmarks like the EU’s General Data Protection Regulation (GDPR) and new legislation developed by many of the US States, such as the California Consumer Privacy Act.

The 2022 Privacy Act Review Report contains a detailed discussion of de-identification and the risks posed by it. A number of initiatives are discussed so as to strengthen privacy protections for de-identified data, including new obligations that conceive of de-identification as being an ongoing process, informed by best available practice, in the context of the release of the information. These recommendations would raise the barriers to de-identification significantly, primarily through requiring adherence to best practice methodologies, not just the current ‘reasonable steps.’

Amendments to the Privacy Act are expected within the life of the current Commonwealth government, i.e., by mid 2025.

Regulatory Guidance

There have been a number of initiatives that outline de-identification approaches and methodologies, all of which have limitations.

These are discussed in a separate Alert that can be found here:  https://www.qualitycare.org.au/news/

Conclusion

The release of aged care patient health information either through consent-based, purpose-based or through de-identification carries with it high levels of regulatory risk. Almost invariably aged care patients have not provided up front consent to widespread data sharing and many are not in a position to provide valid consent. Equally, widespread data sharing is not directly related to the purpose for which their sensitive data was collected.

Monetising de-identified patient data raises significant re-identification risks, particularly at a granular, unit level. All of the most prominent de-identification frameworks are problematic because they do not adequately control for dynamic risks that vary, typically increasing over the information lifecycle and because implementing and governing them is costly, resource intensive and long-term obligation.

This QCAA Alert was compiled by Professor David Watts.

David Watts, Professor, Thomas More Law School, Australian Catholic University, Melbourne.

David is one of Australia’s leading data protection experts with experience as a regulator, policy maker, public and private sector lawyer.

Scroll to Top