Skip to content

De-identification

De-identification removes or modifies personal identifiers to protect privacy. In analytics, it lets you preserve analytical value while meeting privacy requirements.

What It Is

De-identification eliminates direct and indirect identifiers from datasets. It is more than stripping names and emails. Effective de-identification accounts for re-identification through combined attributes.

Identifier Types

Direct:

  • Names, email addresses
  • Phone numbers, residential addresses
  • Social security numbers

Indirect (Quasi-identifiers):

  • Demographics (age, gender)
  • Geographic location
  • Activity timestamps
  • Behavioral patterns

Standards

HIPAA Safe Harbor

The HIPAA "safe harbor" method requires removal of 18 specific identifier categories. Though medical in origin, it applies broadly.

graph TD
    A[Raw Data] --> B[Remove 18 HIPAA Categories]
    B --> C[Check Residual Risks]
    C --> D{Risk < Threshold?}
    D -->|Yes| E[De-identified Data]
    D -->|No| F[Additional Processing]
    F --> C

Expert Determination

A qualified expert determines that re-identification risk is very small. The expert documents the analysis and justifies conclusions with statistical and scientific methods.

Expert Limits

  • No fixed validity period
  • Subjectivity in risk assessment
  • Requires regular review

In Web Analytics

De-identification matters for these data types.

User Activity

  • IP addresses (mask last octet)
  • User-Agent strings (generalize browser version)
  • Referrer URLs (strip query parameters)

Temporal Data

  • Precise timestamps (round to hour or day)
  • Action sequences (add noise)

Geographic Data

  • Replace coordinates with regions
  • Group cities by population

Before and After

Before:

IP: 192.168.1.45
Time: 2025-08-28 14:32:15
City: Almelo
Browser: Chrome 127.0.0.0

After:

IP: 192.168.1.0/24
Time: 2025-08-28 14:00:00
Region: Overijssel
Browser: Chrome 127.x.x

Re-identification Risks

Even when standards are met, residual risk remains. Combining sources can re-identify users.

Public data is matched against de-identified records.

Information from multiple sources is gradually accumulated to build profiles.

Analyzing differences between datasets to extract information about specific individuals.

Technical Methods

Generalization

Replace specific values with broader categories. Exact age "28 years" becomes range "25-30 years."

Suppression

Remove fields or records that risk identification.

Noise Addition

Add statistical distortion that preserves trends but blurs individual records.

Implementation Recommendations

Automation:

  • Apply de-identification at collection
  • Verify method effectiveness regularly

Quality Control:

  • Track impact on analytics accuracy
  • Balance privacy and utility

Regulatory Compliance

GDPR

Under GDPR, de-identified data may fall outside personal data scope if re-identification is impossible. Regulators set a high bar for proving irreversibility.

National Standards

Jurisdictions set their own criteria. Local rules matter when shaping policies.

For Analytics Platforms

Effective de-identification lets analytics:

  • Study user behavior without violating privacy
  • Share aggregates with third parties
  • Comply across jurisdictions
  • Reduce breach impact

Statable researched approaches to balancing protection and analytical value. Properly applied de-identification delivers strong privacy with minimal insight loss.

graph LR
    A[Raw Data] --> B[Identifier Classification]
    B --> C[Choose De-identification Method]
    C --> D[Apply Anonymization Techniques]
    D --> E[Assess Residual Risks]
    E --> F[De-identified Analytics]

De-identification done right is the foundation of responsible analytics. It protects users and still delivers business insight.

About AI participation in writing articles

This article, like many others on our site, was created, written and proofread by a team of developers. Of course, not without the participation of AI assistants. We don't hide this and believe that modern systems are already quite good at handling simple tasks and, relatively speaking, writing an article about Viewport yourself is quite strange. It won't come out significantly better and will take a lot of time. But providing basic understanding to beginner webmasters is necessary. Of course, after the article is written by assistants - there's always proofreading, and this is where not one or two people participate, and only after that the article is published.

Ready to Protect User Data?

Try our analytics tool with built-in de-identification features. Get full control over personal data processing and ensure compliance with international privacy standards.


Ready to take control of your web analytics? Try Statable free for 30 days — no credit card required, full feature access, GDPR-compliant by default. Start your free trial or view a live demo.