VFL Model Training¶
In a vertical federated learning (VFL) process, two or more parties collaboratively train a model using datasets that share a set of overlapping features. Each party has partial information about the overlapped subjects in the dataset. Therefore, before running a VFL training session, a private record linkage (PRL) session is performed to find the intersection and create alignment between datasets.
There are two types of parties participating in the training:
The Active Party owns the labels, and may or may not also contribute data.
The Passive Party contributes only data.
For example, in data sharing between a hospital (party B, the Active party) and a medical imaging centre (party A, the Passive party), only a subset of the hospital patients will exist in the imaging centre’s data. The hospital can run a PRL session to determine the target subset for VFL model training.
VFL Session Overview¶
A hospital may have patient blood tests and outcome information on cancer, but imaging data is owned by an imaging centre. They want to collaboratively train a model for cancer diagnosis based on the imaging data and blood test data. The hospital (active party) would own the outcome and patient blood tests and the Imaging Centre (passive party) would own the imaging data.
A simplified model of the process is shown below.
integrate.ai VFL Flow¶
The following diagram outlines the training flow in the integrate.ai implementation of VFL.
integrate.ai suports the following model types:
Private Record Linkage (PRL) - also known as overlap
Horizontal Federated Learning (HFL) is also supported. Click here to learn more.