The Current State of Generative AI and Personal Data: Strict Regulations and Practical Guides by Country (2025 Edition)
This article is for general informational purposes only and does not constitute legal advice. It is based on publicly available information as of September 8, 2025.
1) Why “Personal Data × Generative AI” Is So Complex
Generative AI involves personal data throughout its long lifecycle: collection → training (including preprocessing and reinforcement learning) → distribution (API/SaaS) → usage (inference and log storage). Training datasets come from diverse sources like scraping, purchased data, and internal logs, raising risks of “traces of personal data” remaining in model weights or derivative data. EU authorities have stated that “AI models trained on personal data are not necessarily anonymous”, reinforcing demands for lawfulness, transparency, and deletion measures (e.g., unlearning).
2) Key Definitions and Principles to Understand First
- Personal Data (GDPR): Any information relating to an identified or identifiable individual (including IPs and online identifiers). Special categories of data (Art. 9) are generally prohibited unless strict exceptions are met.
- Purpose Limitation and Data Minimization: The EU under GDPR, and California under CPRA, require data processing to be “reasonably necessary and proportionate.”
- Legal Basis: In the EU, in addition to consent, contracts, and legal obligations, the legitimate interest (Art. 6(1)(f)) basis has been clarified by the EDPB (three conditions: legitimate interest, necessity, and balancing of interests). Application to generative AI training requires strict prior assessment and documentation.
3) Generative AI Data Flow and Risk Points
- Collection: Scraping, data purchases, internal data. In the EU, there is a TDM (Text and Data Mining) exception for copyrighted works, but if the rights holder opts out in a machine-readable format, commercial TDM is prohibited.
- Training and Evaluation: The EDPB highlights the uncertainty of applying legitimate interest and potential for personal data to be embedded in models. Risks include reidentification or defamation via AI hallucinations.
- Provision (API/SaaS) and Usage (Inference/Logs): In automated decisions (hiring, loans, etc.), many countries have increased obligations regarding transparency, human involvement, explanation rights, and opt-out mechanisms.
4) Key Regulations by Country/Region
European Union (EU) — GDPR + EU AI Act
- AI Act Timeline: Enforced from August 1, 2024. Banned uses apply from February 2, 2025. Transparency obligations for GPAI (General Purpose AI) start in phases from August 2, 2025, and high-risk AI regulations begin full enforcement from August 2026.
- GDPR Application: Requires legal basis, purpose limitation, data minimization, and DPIA at each stage of model development and deployment. The EDPB has imposed strict criteria on data scraping and the use of legitimate interest for training.
- Copyright and TDM: The DSM Directive Art. 3/4 allows exceptions for research and general TDM, but commercial use is prohibited if machine-readable opt-outs are set. Crawlers must respect these opt-outs.
United Kingdom — UK GDPR (DPA 2018) + ICO Guidance
- The UK has published ICO consultation results on training generative AI with personal data under its pro-innovation stance. Key focuses include meaningful transparency, explanation of training data accuracy and purposes, and effectiveness of rights exercise.
- The government plans to advance AI regulation in 2025, led by sectoral regulators.
United States — State-Level Patchwork (CPRA/CCPA, Colorado AI Act, WA My Health My Data, etc.)
- California: The CPPA approved ADMT (Automated Decision-Making Technology) regulations on July 24, 2025, requiring prior notice, opt-out, explanation on request, and risk assessments. Note practical timelines like OAL reviews.
- Colorado AI Act (SB24-205): Requires mitigation of discrimination risks from high-risk AI. Enforcement delayed from Feb 1, 2026 to June 30, 2026. Developers and deployers must ensure due diligence, monitoring, and disclosures.
- Washington My Health My Data Act: Effective from March 31, 2024 (June 30 for small businesses). Strict rules on consent, sale prohibition, and dedicated policies for health data. High impact on health-related generative AI and chatbots.
Japan — APPI (Act on the Protection of Personal Information) + TDM Exception (Copyright Act Art. 30-4)
- APPI Amendments (enforced 2022): Strengthened breach reporting obligations, introduced pseudonymized information, and tightened rules on cross-border transfers and consent. EU-Japan transfers are eased through mutual adequacy.
- TDM Exception: Copyright Act Art. 30-4 permits wide usage of data for analysis not intended for enjoyment, but other laws (e.g., APPI, privacy, unfair competition) still apply.
China — PIPL + Algorithm/Generative AI Rules
- The PIPL requires security reviews, standard contracts, and certifications for cross-border transfers. The Interim Measures on Generative AI (enforced Aug 15, 2023) and Deep Synthesis Regulations (2022) mandate data authenticity, illegal content prevention, and lawful acquisition of personal data.
India — DPDPA 2023 (Implementation Rules Pending)
- Though passed in 2023, draft implementation rules are under discussion in 2025. The law adopts a model where cross-border data transfers are generally allowed unless blacklisted by the government. SDFs (Significant Data Fiduciaries) will face stricter obligations. Final rules and launch timeline remain key.
Singapore — PDPA + Model AI Governance Framework for GenAI (2024)
- Regulator PDPC has released practical frameworks for GenAI including AI Verify. Emphasis is placed on transparency of training data, risk management, and clear responsibility sharing.
Brazil — LGPD + ANPD Enforcement Framework
- The LGPD is Brazil’s comprehensive data law. ANPD has issued sanction calculation rules (RDASA, 2023), and in 2024, international transfer regulations (19/2024) were introduced. Alignment of AI practices with transfer rules is progressing.
5) Common Issues and Pitfalls
- Source of Training Data: Public availability ≠ free to use. The EU mandates respect for opt-outs via machine-readable means and lawful acquisition and transparency for personal data.
- Model Deletion Requests: The EDPB argues that “models are not necessarily anonymous.” Even if full “erasure” is hard, reasonable measures like retraining, unlearning, or output suppression and proper explanations are required.
- Automated Significant Decisions: Under the EU AI Act and US state laws, there is growing enforcement of pre-notification, explanation, human involvement, and opt-out rights (e.g., California’s ADMT regulations).
6) Compliance Implementation Checklist for Practitioners
- Data Map: Visualize sources (scraping/purchase/internal), presence of personal/sensitive data, and cross-border destinations.
- Assign Legal Basis (Per Stage): Assess Art.6/9 or CPRA criteria separately for collection, training, distribution, and inference logs. If using legitimate interest, document the three criteria and explore alternatives.
- Respect Rights Holders and Non-Holders’ Opt-Outs: Implement crawler detection of EU TDM opt-outs (robots.txt/metadata).
- DPIA/Risk Assessments: Evaluate discrimination, safety, and privacy risks per use case (hiring, loans, healthcare), with mitigation and re-evaluation plans. Align with CA/CO requirements.
- Transparency: Combine privacy notices with GenAI-specific supplements (type/source of training data, rights exercise methods, contact points, model purpose explanations). Align with ICO’s “meaningful transparency”.
- Data Minimization & Retention: Preprocess to remove/generalize PII before training. Clearly define purpose and duration for storing prompts and output logs (per CPRA).
- Supplier Management: Contractually require compliance from model providers, data brokers, annotation BPOs, etc. (cross-border rules, subcontracting, security, deletion support, audit rights).
- Cross-Border Transfers: Follow SCC/adequacy for EU, mutual adequacy for Japan↔EU, security review/standard contracts for China, and India’s blacklist approach.
- Rights Handling: Define SLA for access, correction, deletion, and objection. For model-based requests, prepare reasonable response policies (e.g., search, retraining, weight suppression).
- Internal Governance: Standardize AI governance committees, model cards, evaluation reports, and major incident reporting (per EU AI Act).
7) Red Flags by Major Use Case
- Hiring and HR Management: Automated decisions (screening, evaluation, termination) now require pre-notification, explanation, and opt-out, especially under CA ADMT and CO AI laws. Bias evaluation and human oversight are mandatory.
- Healthcare/Wellness: The Washington My Health My Data Act imposes strict rules on broadly defined health data. Apps must have dedicated policies, consent mechanisms, and sales bans.
- Generative Content Distribution: Dual risks in the EU related to TDM opt-outs/copyright and reputational harm from output inaccuracies.
8) Summary: Building the Right Strategy (Fastest Path)
- Separate “Training” and “Usage” in design (distinct legal bases, notices, retention).
- Unify to EU-level “highest standards” (anticipate EDPB/AI Act transparency and assessment requirements).
- Implement TDM Opt-Out Compliance in Crawlers and build rights-holder response processes.
- Standardize CA/CO Templates (ADMT notices, opt-out and explanation procedures, risk assessment templates) across the organization.
- Segment Cross-Border Transfers by contract/tech: EU adequacy & SCC, Japan APPI, China PIPL, India DPDPA “lanes.”
Appendix: Primary Sources for Reference
- EU AI Act Implementation Timeline (Banned Uses = Feb 2025, GPAI = Aug 2025, High Risk = Aug 2026…)
- EDPB Opinion 28/2024 (On AI models, personal data, legitimate interest, and anonymization).
- EU TDM (DSM Directive Art. 3/4) and Machine-Readable Opt-Outs.
- UK ICO Generative AI Consultation Results (Effectiveness of Transparency).
- CA ADMT Regulations (Approved July 2025).
- CO AI Law (SB24-205, Enforcement Extended to June 30, 2026).
- Japan APPI (Cross-Border Explanation/Consent, Mutual EU Adequacy) and Copyright Act Art. 30-4 (For Analytical Use).
- China’s Interim GenAI Measures / PIPL Cross-Border Regime.
- India DPDPA (2025 Draft Rules, Cross-Border “Blacklist” Approach).
If needed, I can create implementation-ready DPIA templates or ADMT notice samples (in Japanese/English) tailored to your industry or use case. Just let me know your specific scenario (e.g., recruitment, customer support, healthcare, finance).