Sharing Data and Knowledge Without Compromising Privacy
In today's data-driven world, many companies have valuable information but lack the expertise to use it effectively. On the other hand, some organizations have the skills to analyze data but need access to diverse datasets to build robust machine learning models. This creates a win-win situation where both parties can benefit from collaboration. However, there's a catch: both sides want to keep their sensitive information private.
The Privacy Challenge
- Data owners want to protect the privacy of their training data.
- Model owners want to keep their models and training methods confidential, as they may contain valuable intellectual property.
Existing solutions, like federated learning and split learning, fall short in meeting these privacy needs simultaneously.
Introducing Citadel
Citadel is a system designed to address these concerns. It uses Intel SGX, a technology that creates secure enclaves where sensitive data can be processed without being exposed.
How Citadel Works
- Runs distributed training across multiple enclaves, each representing a data owner.
- Uses an aggregator enclave for the model owner.
- Employs zero-sum masking and hierarchical aggregation to prevent any data or model leakage.
- Creates a strong information barrier between the enclaves.
Advantages of Citadel
- Better scalability compared to other SGX-protected training systems.
- Stronger privacy guarantees.
- Cloud deployment tests with various machine learning models showed that Citadel can handle a large number of enclaves with minimal slowdown due to SGX.
Considerations
While Citadel presents a promising solution, it's important to note that:
- The effectiveness of such systems depends on the trustworthiness of the underlying hardware and software.
- The performance overhead introduced by SGX might not be negligible in all scenarios.
Nonetheless, Citadel takes a significant step towards enabling secure and private collaborative machine learning.