Cyber Awareness: Artificial intelligence and protecting sensitive data

Over the last year, you’ve no doubt heard about a number of breakthroughs in Artificial Intelligence (AI) technologies. We want to discuss the risks associated with AI technologies, and the critical importance of safeguarding sensitive data when using AI tools for work. Sensitive data can include: electronic Protected Health Information (ePHI), Personally Identifiable Information (PII), Identifiable Health Information (IHI), and proprietary research data, amongst others.

What is AI?

AI is the capacity of computers to exhibit or simulate intelligent behavior. For example, ChatGPT can write songs or generate computer code, and complete the task in seconds. Other types of AI can analyze large data sets quickly and generate detailed reports based on defined parameters.

Security, Privacy, and Reliability Concerns with AI Tools

Data Security:
- AI must be “trained” on data in order to provide useful output in the same way that humans are trained through formal education.
- Many commercial AI tools, including OpenAI’s ChatGPT, Microsoft’s Bing Chat, and Google’s Bard are trained on data available in publications and online. They are also trained using data submitted to them by users.
- When users submit sensitive or proprietary data to AI tools, that data can show up in responses given to other people.
AI “Hallucinations”:
- Some AI technologies like, “Large Language Models” or LLMs (ChatGPT, Google Bard, etc.) are designed to be flexible and try to generate useful approximations when clear answers can’t be determined.
- It’s critical to realize that sometimes such technologies can “hallucinate” content that looks on the surface as a very believable answer, but is not based in fact at all. Therefore, it’s important to ask the LLM for references and vet content that is critical in nature.
- Depending on the AI technology, up to a third of responses may be hallucinatory to some extent.
Data Handling and Protection:
- AI tools can do different things with the data submitted to them. Some may store or process data externally, which can raise concerns about data sovereignty (the country that data resides in) and compliance with data protection regulations.
- Many AI tools, especially LLMs, often retain extensive prompt history or allow upload of files to inform prompts. Such data may not be protected sufficiently, may become the property of the technology vendor, and may introduce a potential data breach risk for Emory.

Best Practices

Consider the following best practices to avoid some of the risks mentioned above:

Approved AI Solutions: Be on the lookout for future guidance from Emory about approved AI tools and solutions that have been vetted for appropriate use with sensitive data.

Data Minimization:

Share only the minimum necessary data with AI tools. Avoid including sensitive information (Confidential or Restricted Data). Sensitive information may be used inappropriately or retained by companies who are training their AI tools unless the proper contractual agreements and security controls are in place.
Sharing Protected Health Information (PHI) or electronic PHI (ePHI) and proprietary research data with non-contracted or non-Emory AI systems is explicitly prohibited at Emory unless the proper contractual agreements are in place.
Submitting copyrighted or licensed materials or data to an AI tool without permissions may present legal risk and should be avoided.
To the extent practical, AI tools should retain as little information as possible beyond what’s needed to complete their task.

Verify the Output:

AI is not currently a replacement for expert knowledge or trusted sources. It can be used to assist you to do your work, but AI-generated content should be avoided if you cannot validate the output of such tools.
Always validate computer code generated by an AI tool before use. Moreover, never use real data or elevated rights in any validation process.
Double-check AI output for accuracy, and do not assume that it will be correct even if the output looks high quality and realistic.

As AI technologies continue to evolve, the security and privacy implications will evolve as well. Stay up to date with Emory’s latest guidelines and approved AI tools, and always take into consideration the potential risk before entering data into an AI application.

Leave a Reply Cancel reply