ShadowIT as a security problem is as old as the hills.
ShadowAI, its new avatar, is looked upon as a serious security problem.
Undoubtedly,Shadow AI is a major cybersecurity risk. Do we realize that unknowingly, it is also a privacy-related problem?
Besides expansion of the attack surface, creating security blindspots, possibly being carriers of malware, ransomware, and other vulnerabilities, and amplification of insider threats, Shadow AI tools have a far more sinister side that lurks in the shadow of it being a cybersecurity problem. Pun intended.
The core risk is that of Shadow AI being a large-scale, uncontrolled, porous surface that is continuously leaking personal data of employees, vendors and customers or sensitive company data into public Large Language Models (LLMs).This information, mostly leaked inadvertently, is then logged, stored, and potentially used to train the public models, leading to data leaks and compliance violations.
Is Shadow AI really a privacy issue? What's the evidence?
Several examples, past and recent, have provided ample proof that employees shared varied data types with unsanctioned AI tools.
Let’s look at some examples:
Medical practitioners entering patient data into ChatGPT to get help with a letter. A legal analysis of the case noted that this raised serious GDPR and confidential concerns as identifiable health information was sent to a third-party AI system without explicit patient consent or a suitable legal basis.1
More recent reports in 2025 from well-known AI Security vendors have demonstrated that 77% of enterprise GenAI users pasting data into AI tools for productivity gains, with 22% of that text containing Personally Identifiable Information(PII) or Payments & Card Information (PCI) data. The same report also stated that 82% of this AI usage was through personal, unmanaged accounts, outside of corporate IT control.2
Another famous AI Security vendor analyzed one million prompts and 20,000 files sent to300 generative AI tools, with more than 4% of prompts and over 20% of uploaded files contained sensitive corporate data.3
There is no dearth of evidence that shows that employees regularly paste PII, card numbers, payroll, and customer records into public AI tools, mostly from personal and unmonitored accounts that exist outside enterprise IT logging and monitoring.
What compounds the privacy problem?
· Data Memorization: Large language models (LLMs) gather knowledge from large datasets sourced from the public internet, which may include sensitive personal information (PII). An adversary with the capability to exploit these models can retrieve the original datasets that the model was trained on, and exploit the data extracted. This is possible via Training Data Extraction Attacks such as Model Inversion, Memorization Exploitation, or Prompt-Based Data Recovery.4
· User Oversharing: Given that LLMs make great companions in an increasingly lonely world, many users mistakenly assume these to be secure and even personify/anthropomorphize these tools. This lowers the hesitation of sharing secrets and sensitive personal information such as corporate secrets or personal health details, in their prompts without realizing the consequences.1
· Data Retention Policies:Numerous users remain unaware that public LLM service providers often implement privacy policies indicating that user prompts and interaction data are collected, stored, and potentially utilized for future model training by default, unless users explicitly opt out, if such an option exists. Ambiguous language/omission of specifics, fine print, lack of transparency in how data is handled and retained. Freemium models are offered on the premise of collecting user data for model training.5
· Inference Capabilities:Advanced LLMs possess the ability to infer sensitive personal attributes (such as a user’s residence, occupation, or health status) from seemingly innocuous input or contextual clues within user prompts, even if the user attempts to maintain anonymity.4,6
· Third-Party Integrations:LLMs frequently interact with external tools, agents, or APIs (such as email plugins or web browsers) to accomplish tasks, which further increases the risk of data being transferred without consent.6
· False Sense of Control OverPersonal Data: LLM providers might overstate the control users have over their data or inflate their own obligations beyond what the policy guarantees.Policies generally contain statements of good intent such as "We are committed to protecting your personal data." However, these may not align with actual business practices making users believe that their data is more secure than it is.5
WhatDo Global Privacy Laws And AI Regulations Expect?
1. People stay in control of their data
- Individuals should know what data is collected, why, and by whom.
- They must have rights to access, correct, delete, and sometimes port their data.
- AI doesn’t cancel these rights just because the system is “complex”or “opaque.”
2. Purpose limitation and data minimization
- Collect only what is necessary for a clearly stated purpose.
- Don’t silently repurpose data for new AI uses without a lawful basis and, often, new consent.
- Training, fine-tuning, and inference must still respect the original purpose and expectations.
3. Transparency and explainability
- Users should be told when AI is used and, at a high level, how it affects them.
- Organizations must be able to demonstrate what data goes in, what comes out, and why it’s lawful.
4. Risk-based governance and accountability
- Higher-risk AI uses (credit, hiring, health, biometrics, children, etc.) need stricter safeguards, impact assessments, and sometimes audits.7,8,10
- A named party is accountable: you must document, monitor, and be able to show regulators you did the right things.
5. Security and safeguards by design
- “Privacy by design and by default”and “security by design”are non-negotiable.6,7,8
- Access control, redaction, encryption, logging, retention limits, and breach response must be baked into the AI lifecycle.
6. Fairness, nondiscrimination, and harm prevention
- I systems should not unlawfully discriminate or create unjustified harm.
- Many regimes require testing, monitoring, and mitigation of bias and adverse impacts on individuals and groups.6,7,8
7. Governance over the full lifecycle
- Obligations apply from data collection through training, deployment, monitoring, and decommissioning.
- Once data is fed into a model, it isn’t“off the hook”legally; lifecycle governance and, where possible, traceability are required.6,7,8,10
The 8-Step Defense Playbook for AI Data Privacy
1. Build and maintain an inventory of all AI systems, datasets, models, and vendors, including what personal data they touch, where it comes from, and why it issued. 7,8
2. Classify data by sensitivity and map data flows end to end so you can spot repurposing, spillovers, and cross-border transfers.
3. Next, enforce strict data minimization. Collect only what is necessary, strip identifiers as early as possible, and define short, documented retention limits for training data, logs, and prompts.6,7
4. Embed privacy into the AI lifecycle. Run data protection or algorithmic impact assessments before deployment, maintain a risk register, assign mitigation owners, and trigger reassessment whenever the use case or dataset changes.7,8,10
5. Harden operations. Strong access control, encryption, redaction of sensitive fields, robust logging, and reproducible training pipelines that support audits and, where feasible, targeted unlearning.4,6,7,8
6. Use internal or third-party algorithmic audits to verify compliance with consent, purpose limitation, and fairness expectations.7,8,10
7. Tighten vendor governance with explicit contract clauses on data use, sub-processors, security standards, incident notification, and model retraining rights. 6,7,10
8. Finally, prioritize transparency. Clear notices, AI labeling, user controls, and training so staff understand both capabilities and limits of AI systems handling personal data.6,7,8,10
References(Harvard style)
1 Li, J., et al. (2023) Security implications of AI chatbots in health care’,JMIR Medical Education / PubMed Central article on risks of using ChatGPT with patient data.
2 LayerX Security (2025) Enterprise AI and SaaS Data Security Report 2025.LayerX Security research report quantifying GenAI copy-paste and unmanaged-account risk.
3 Harmonic Security (2025) 22% of all files and 4.37% of prompts submitted to GenAI tools by employees contain sensitive data’,Harmonic Security data leakage analysis based on 1 million prompts and 20,000files across 300 GenAI tools.
4 Carlini, N., Tramèr, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K.et al. (2021) ‘Extracting training data from large language models’,Proceedings of the USENIX Security Symposium.
5 OpenAI (2025) How your data is used to improve model performance and related data usage and enterprise privacy documentation. OpenAI policy resources describing logging, retention and use of prompts for model improvement.’
6 European Union (2016) General Data Protection Regulation, Article 5 and associated guidance on data protection principles including lawfulness, fairness, transparency, purpose limitation and data minimisation.
7 Information Commissioner’sOffice (ICO) (2023) Guidance on AI and Data Protection and Accountability andGovernance in AI systems, including the use of Data Protection ImpactAssessments.
8 National Institute of Standards and Technology (NIST) (2023) ArtificialIntelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1, framework for risk-based governance of AI systems.
9 UC San Diego and Nanyang Technological University researchers (2024) ‘This Prompt Can Make an AI Chatbot Identify and Extract Personal Details from YourChats’, Wired article describing the Imprompter prompt-injection attack against LLMs.
10 Hullen, N. (2024) ‘Top10 operational impacts of the EU AI Act –Leveraging GDPR compliance’, IAPP Resource Center article on operational interplay between the EU AI Act and GDPR.

Praneeta Paradkar is a cybersecurity product leader specializing in AI Governance, Data Protection, and Privacy. She is the Chief Product Officer at Quilr, where she designs enterprise solutions to reduce shadow AI, sensitive data leakage, and unsafe AI usage.
With over 20+ years of industry experience, her first love is Product Management. She has led roadmap, strategy, and execution for platforms that sit at the intersection of security, privacy, and AI.
Praneeta holds a Bachelor’s in Pharmaceutical Sciences, an MBA in Marketing, and Post Graduate Diploma in Cloud Computing, she has built her career translating complex technical risks into clear, actionable solutions for global enterprises. Her work spans end-to-end AI Security and AI-powered DLP mapped to frameworks such as GDPR, NIST AI RMF, ISO 42001, OWASP LLM Top 10, and MITRE ATLAS.
She loves to write and is passionate about democratizing AI, cybersecurity, and product management. Through her initiative Plaintext Protocol, launching on 5 December 2025, she aims to make these topics accessible to both technical and non-technical audiences.

.webp)