Federated Learning: Collaborative ML without Centralized Data

by Anurag Sinha, Co-Founder & Managing Director, Wissen Technology

September 18, 2023

320

In a pioneering move, researcher scientists Brendan McMahan and Daniel Ramage revealed in 2017 that Google had initiated a strategy to decentralize certain machine learning tasks onto mobile devices. This strategic shift was aimed at safeguarding sensitive data while maintaining privacy. This strategy was based on the concept of Federated Learning, a revolutionary approach that enables collaborative machine learning without centralizing the data.

Traditionally, data for machine learning models is collected and stored in a centralized manner, raising concerns about data privacy and security. Federated learning addresses this challenge by allowing machine learning models to be trained across decentralized devices, such as smartphones, without the need to aggregate raw data in a central server.

The global federated learning market is projected to grow from $127 million in 2023 to $210 million by 2028. The surge in market value is a testament to the increasing adoption of this decentralized machine-learning paradigm across various industries and domains.

What Is Federated Learning?

Federated learning is a set of learning solutions that support learning in a distributed environment. Distributed environments are characterized by the lack of a centralized server that stores and delivers information to learners. Federated learning requires the cooperation and coordination of multiple systems to meet the needs of a group.

In a general context, “federation” is an agreement between two or more organizations (or systems) to conduct business with each other, to share resources, or to exchange information. The goal of a federation is to achieve fluid, collaborative relationships that allow organizations to perform their roles more effectively.

Traditionally, machine learning models are trained on a central server, necessitating the aggregation of vast amounts of data from various sources. However, federated learning flips this paradigm by allowing individual devices, such as smartphones, to participate in the training process while keeping their data within their respective environments.

How Does Federal Learning Actually Work?

The standard model training approach is decentralized in federated learning. Instead of centralizing data on a single server, the training process is distributed to individual devices or edge devices where data exists.

The procedure starts with a global model that all participating devices share.
Then, using its own data, each device does local model training without sharing the raw data.
Only the model updates are communicated to a central server after local training, where they are aggregated into an enhanced global model.

This iterative process is ongoing, with devices working together to refine the global model while keeping their data localized. Because raw data remains on the devices, this strategy reduces data exposure while increasing privacy.

What are Federated Learning Types and Frameworks?

Federated learning encompasses various types and frameworks that contribute to its versatility and applicability across diverse domains. These encompass different methods and strategies that enable collaborative machine learning while respecting data privacy and security.

Here are some types of federated learning:

1. Vertical Federated Learning

Vertical federated learning is particularly suitable for situations where multiple entities have unique characteristics within the same dataset. It is used when data sources possess complementary attributes that, when brought together, improve the overall performance of the model.

2. Horizontal Federated Learning

This strategy, unlike vertical federated learning, involves many entities with the same types of attributes but from distinct datasets. It is appropriate for scenarios in which the same type of data is obtained from multiple sources.

3. Federated Transfer Learning

This type uses federated learning to adjust pre-trained models to specific tasks. It is useful when entities have comparable tasks but have a limited amount of data for each task.

Here are some frameworks of federated learning:

FATE (Federated AI Technology Enabler)

FATE, an open-source project by Webank, helps create a safe computing environment for federated learning activities. It constitutes a myriad of tools for data encryption, secure computation, and federated learning methods. FATE is intended to address data security and compliance issues, making it appropriate for sectors with stringent privacy rules. What stands out about it is its three principles:

Data isolation of keeping data localized
Lossless for preserving model quality
Flexible for scalable federated modeling pipeline production

PySyft

PySyft is a Python-based open-source federated learning framework. It focuses on decentralized machine learning that is secure and privacy-preserving. PySyft enables data scientists to execute activities such as model training, inference, and aggregation while maintaining data privacy across many devices.

IBM Federated Learning

IBM Federated Learning is a system for training machine learning models across numerous devices while maintaining data privacy. To maintain the security of sensitive data during the federated learning process, it employs modern encryption techniques and differential privacy features.

Why Prefer Federated Learning?

Federated learning provides several compelling benefits that make it a preferable solution in some situations. One of the key advantages is increased data privacy. The risk of sensitive information exposure is greatly reduced because raw data remains on individual devices and is never directly shared. This is especially critical when dealing with personal or secret information.

Furthermore, federated learning eliminates the need for large-scale data transfers to a central server, saving money on communication expenses and bandwidth. The decentralized aspect of federated learning also allows enterprises to exploit the potential of distributed computing — allowing them to tap into the computational resources of many devices without having to relocate data.

Real-life Applications of Federated Learning

A quite intriguing study was carried out at the start of 2023, which extended the potential of federated learning by allowing it to operate on data streams instead of static datasets. For sure, there’s a lot to discern and innovate in this space. For context, here’s how federated learning can be leveraged across industries:

Healthcare

In the healthcare sector, federated learning enables medical institutions to develop robust models for disease diagnosis while protecting personal privacy. By training models on decentralized data sources such as individual hospitals, medical data is kept secure, while pooled insights help improve diagnostic accuracy.

Manufacturing

Federated learning is critical in the manufacturing business for optimizing production processes. Different factories in a production network can train models cooperatively to predict equipment faults or improve product quality. This translates into increased efficiency while keeping proprietary manufacturing data within the individual facilities.

Transportation

Federated learning is used in transportation to improve the performance of autonomous cars. Each car collects data from its surroundings and helps in the collaborative refinement of driving models. This enables autonomous driving algorithms to be safer and more accurate without the need to centralize sensitive location data.

In a Nutshell

Indeed, federated learning appears as a game-changing strategy. It enables collaborative machine learning without requiring sensitive data to be centralized, making it a revolutionary approach in highly regulated industries ranging from healthcare and manufacturing to transportation.

As the worldwide market for federated learning expands, so does its potential to alter how we leverage data’s power. At the very least, federated learning can potentially drive interminable innovation through efficient ML model training — all while ensuring privacy through data localization.