Secure Distributed-Learning on Threat Intelligence

An armasuisse S+T – EPFL/LDS Collaboration Project in the Framework of C4DT

In the current interconnected world, the number of new threats and incident indicators is constantly increasing, to the point that it is impossible to adapt the detection and mitigation systems without an updated and comprehensive knowledge base that can be used to decipher the patterns of the incidents and to train advanced models that can predict and detect them.

Current efforts for sharing threat-intelligence data (e.g., the Malware Information Sharing Platform, MISP,), work on a centralized database, where all the participating organizations have to upload their threat data.

Cyber security information is often extremely sensitive and confidential, it introduces a tradeoff between the benefits of improved threat-response capabilities and the drawbacks of disclosing national-security-related information to foreign agencies or institutions. This results in the retention of valuable information (a.k.a. as the free-rider problem), which considerably limits the efficacy of data sharing. The purpose of this project is to resolve the cybersecurity information-sharing tradeoff by enabling more accurate insights on larger amounts of more relevant collective threat-intelligence data.

This project will have the benefit of enabling institutions to build better models by securely collaborating with valuable sensitive data that is not normally shared. This will expand the range of available intelligence, thus leading to new and better threat analyses and predictions.

This is made possible by offering provable technological guarantees where authorized users of the platform can access only the global insights (cyberthreat models) built on the whole network data, whereas no access or transfer is granted on the local contributed data that remains under the control of its source. We achieve this purpose

  • by designing a MISP-compatible distributed architecture without a centralized database (each institution keeps full control over their data records that never leave their security perimeter), and
  • by integrating advanced cryptographic techniques based on the paradigm of multiparty homomorphic-encryption, which makes it possible to efficiently and scalably compute machine-learning models on encrypted distributed data, and to enable a secure release of either the model or only of predictions produced by the model (model-as-a-service), according to the system model.

CYD Campus Collaborators

Dr. Vincent Lenders, Director of the Cyber-Defence Campus and the head of the Cyber Security and Data Science Department at armasuisse Science and Technology
Dr. Alain Mermoud, Scientific Project Manager at the Cyber-Defence Campus of armasuisse Science and Technology, EPFL Innovation Park
Dr. Colin Barschel, Scientific Project Manager at the Cyber-Defence Campus of armasuisse Science and Technology, EPFL Innovation Park
Dr. Metin Feridun, Scientific Project Manager at the Cyber-Defence Campus of armasuisse Science and Technology, ETH Zurich
Damian Pfammatter, Scientific Project Manager at the Cyber-Defence Campus of armasuisse Science and Technology, ETH Zurich

 

EPFL/LDS Researchers

Prof. Jean-Pierre Hubaux
Dr. Juan Ramón Troncoso-Pastoriza
Mickaël Misbach
Romain Bouyé
Joao Sa

 

Articles, Preprints

 

Thesis proposal

  • S. Gillard, T. Maillart, D. Percia David, A. Mermoud: “Machine-Learning Predictive Models for Distributed Privacy-Preserving Threat-Intelligence Platforms” Information Science Institute, Geneva School of Economics and Management, University of Geneva