Why privacy matters for security ML
Security telemetry (logs, user behaviour, device signals) is powerful for detecting threats, but it often contains sensitive personal data. Privacy-preserving ML enables building effective detectors while reducing exposure of raw data.
Key approaches
Federated learning
Models train on-device and only send model updates (not raw data) to a central aggregator. This reduces the need to centralise user logs while still benefiting from diverse data.
Differential privacy
Mathematical guarantees are added to model updates or query results so that individual data points cannot be reconstructed. When combined with federated learning, it provides strong privacy assurances.
Secure aggregation
Cryptographic techniques let servers combine model updates from many devices without seeing any single device's contribution in cleartext.
Use-cases in security
- On-device phishing detection that improves from signals across users without sending message contents to a central server.
- Anomaly detection for login patterns where raw logs remain on-premises and only anonymised model improvements are shared.
- Malware classification improvements where telemetry stays local and only aggregated learning occurs.
Practical considerations
Privacy-preserving techniques reduce risk but are not a silver bullet. They require careful engineering, auditing, and clear user consent. For organisations, prioritise transparent policies and independent verification of privacy claims.
Where this fits with Esrok
This post supports our AI + security pillar and links naturally to privacy guidance: Privacy, and to authentication discussions like Passkeys explained, which reduce credential exposure.