Pseudonymization vs Anonymization: Reversible and Irreversible Data Protection under GDPR

Pseudonymization and anonymization are two different approaches to protecting personal data. The distinction matters for GDPR compliance and for building analytics that balance privacy and utility.

Pseudonymization: Reversible Protection

Pseudonymization replaces identifying information with pseudonyms or artificial identifiers so the data can be restored only with separately stored additional information.

GDPR Definition

According to Article 4(5) GDPR, pseudonymization means "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information."

Key Characteristics

Reversible. Original data can be restored when the key or mapping table is available.

Still personal data. Pseudonymized data remains personal data under GDPR if the controller can de-anonymize it.

Technical pattern:

graph TD
    A[Original Data] --> B[Generate Pseudonym]
    B --> C[Create Mapping Table]
    C --> D[Secure Key Storage]
    B --> E[Pseudonymized Data]
    D --> F[Recovery Capability]
    E --> F

Methods

TokenizationCryptographic HashingData Masking

Replace sensitive data with tokens stored in a protected database.

Tokens have no mathematical link to the original
High security
Full control over de-anonymization

Use keyed cryptographic functions.

Deterministic for identical inputs
Comparison without de-anonymization
Strength depends on the key

Partially conceal identifiers while keeping structure.

Format preserved
Reversible
Useful for test environments

Pseudonymization Example in Analytics

Original User Data:

Email: [email protected]
IP: 192.168.1.100
Device ID: ABC123XYZ

After Pseudonymization:

User Token: USR_789456123
IP Hash: 4f3d2a1b9c8e7f6a
Device Hash: DEV_445566778

Mapping Table (stored separately):

USR_789456123 → [email protected]
4f3d2a1b9c8e7f6a → 192.168.1.100
DEV_445566778 → ABC123XYZ

Anonymization: Irreversible Protection

Anonymization is an irreversible process that prevents direct or indirect identification.

Irreversibility. Recital 26 GDPR requires that "information does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable."

Out of GDPR scope. Properly anonymized data sits outside GDPR.

Techniques

AggregationGeneralizationSuppression

Combine records into groups for statistical results.

Pros:

Full individual protection
Preserves statistical significance
Suitable for reporting

Cons:

Loses detail
Limited analysis options
No individual tracking

Replace specific values with broader categories.

Age ranges instead of exact age
Regional grouping instead of precise addresses
Time intervals instead of exact timestamps

Remove identifying fields or records entirely.

Drop direct identifiers
Exclude unique records
Filter rare values

Pseudo-anonymization Risks

Many techniques considered anonymization actually represent pseudonymization, as data can be re-identified using additional information or modern analysis methods.

Comparison

Criteria	Pseudonymization	Anonymization
Reversibility	Reversible	Irreversible
GDPR Status	Personal data	Non-personal data
Re-identification risk	Low with proper implementation	Theoretically zero
Data utility	High analytical value	Limited detail
Security focus	Key and mapping protection	Verifying irreversibility

Application in Analytics

When to Pseudonymize

User sessions. Track behavior across sessions with the option to link data when needed.

A/B testing. Stable user groups for experiments while keeping result analysis.

Personalization. Personalized content without revealing identity to analytics.

When to Anonymize

Public reporting. Aggregated reports for publication without disclosure risk.

Research. Data for scientific work or new algorithms.

Long-term storage. Historical archives for trend analysis.

Practical Implementation

Pseudonymization for User Journeys:

// User receives permanent pseudonym
const userPseudonym = generatePseudonym(realUserId);

// Events linked to pseudonym
trackEvent('page_view', {
    user: userPseudonym,
    page: '/products',
    timestamp: Date.now()
});

Anonymization for Aggregated Reporting:

// Data aggregated without recovery possibility
const aggregatedStats = {
    timeRange: 'daily',
    totalViews: 15420,
    uniqueVisitors: 8756,
    topPages: ['/home', '/products', '/about'],
    // Individual users not recoverable
};

Legal and Ethical Notes

Technical:

Store de-anonymization keys separately
Restrict access to mapping tables
Encrypt additional information

Organizational:

Separate roles and responsibilities
Audit access to de-anonymization
Key management policies

Recommendations

Choosing the MethodImplementation

Pseudonymize when:

User tracking over time is needed
Personalization or targeting is required
Longitudinal studies are planned

Anonymize when:

Data goes public
Strict confidentiality is required
Individual identification is unnecessary

Pseudonymization:

Use cryptographically strong algorithms
Secure key storage
Rotate pseudonyms

Anonymization:

Combine multiple techniques
Run regular re-identification risk assessments
Document procedures and decisions

We have studied the effectiveness of different protection approaches. The choice between pseudonymization and anonymization depends on business needs, analytics requirements, and acceptable risk.

graph TD
    A[Personal Data] --> B{Data Linking Required?}
    B -->|Yes| C[Pseudonymization]
    B -->|No| D[Anonymization]
    C --> E[Reversible Protection]
    D --> F[Irreversible Protection]
    E --> G[Remains Personal Data]
    F --> H[Not Personal Data]

Effective protection requires planning and clear understanding of the available methods. The right choice keeps you compliant and keeps analytics valuable.

About AI participation in writing articles

This article, like many others on our site, was created, written and proofread by a team of developers. Of course, not without the participation of AI assistants. We don't hide this and believe that modern systems are already quite good at handling simple tasks and, relatively speaking, writing an article about Viewport yourself is quite strange. It won't come out significantly better and will take a lot of time. But providing basic understanding to beginner webmasters is necessary. Of course, after the article is written by assistants - there's always proofreading, and this is where not one or two people participate, and only after that the article is published.

Need Help with Data Protection?

Our analytics platform provides built-in tools for both pseudonymization and anonymization. Get full control over personal information processing with GDPR compliance.

Ready to take control of your web analytics? Try Statable free for 30 days — no credit card required, full feature access, GDPR-compliant by default. Start your free trial or view a live demo.