HFL FFNet Model Training with a Sample Local Dataset (iai_ffnet)

To help you get started with federated learning in the integrate.ai system, we’ve provided a tutorial based on synthetic data with pre-built configuration files that you can run using a task runner. In this example, you will train a federated feedforward neural network (iai_ffn) using data from two datasets. The datasets, model, and data configuration are provided for you.

Make sure you have installed the SDK and registed your dataset(s).

Understanding Models

integrate.ai has several standard model classes available, including:

  • Feedforward Neural Nets (iai_ffn) - uses the same activation for each hidden layer.

  • Generalized Linear Models (iai_glm) - uses a linear feedforward layer.

  • Gradient Boosted Models (iai_gbm) - uses the sklearn implementation of HistGradientBoostingModels.

  • Linear Inference Models (iai_linear_inference) - performs statistical inference on model coefficients for linear and logistic regression.

  • Principal Component Analysis (iai_pca) - performs multivariate linear transformation which calculates the principal components based on results from the principal component analysis.

Model configuration

These standard models are defined using JSON configuration files during session creation. The model configuration (model_config) is a JSON object that contains the model parameters for the session.

model_config = {
    "experiment_name": "HFL FFNet session",
    "experiment_description": "Testing the HFL FFNet configuraiton",
    "strategy": {     
        "name": "FedAvg",       // Name of the federated learning strategy
        "params": {}
        },
    "model": {                  // Parameters specific to the model type 
        "params": {
            "input_size": 15, 
            "hidden_layer_sizes": [5, 5, 5], 
            "output_size": 2
                   }
            },
    "balance_train_datasets": False,    // Performs undersampling on the dataset
    "ml_task": {                        // Specifies the federated learning strategy
        "type": "classification",
        "params": {
            "loss_weights": None,  
        },
    },
    "optimizer": {
        "name": "SGD",              // Name of the PyTorch optimizer used 
        "params": {
            "learning_rate": 0.2,
            "momentum": 0.0}
            },
    "differential_privacy_params": {    // Defines the differential privacy parameters
        "epsilon": 4, 
        "max_grad_norm": 7
        },
    "save_best_model": {
        "metric": "loss",           // To disable this and save the model from the last round, set to None
        "mode": "min",
    },
}

There are five main properties with specific key-value pairs used to configure the model:

  • strategy - Select one of the available federated learning strategies from the strategy library.

  • model - Defines the specific parameters required for the model type.

  • ml-task - Defines the federated learning strategy and associated parameters.

  • optimizer - Defines the parameters for the PyTorch optimizer.

  • differential_privacy_params - Defines the differential privacy parameters. See Differential Privacy for more information.

The example in the notebook is a model provided by integrate.ai. For this tutorial, you do not need to change any of the values.

Data configuration

The data configuration is a JSON object where the user specifies predictor and target columns that are used to describe input data. This is the same structure for both GLM and FNN.

data_config = {
    "predictors": consumer_features,
    "target": target,
}

Once you have created or updated the model and data configurations, the next step is to create a training session to begin working with the model and datasets.

Specify dataset names

Use variables to specify the names of your registered datasets. The names must match the names given during dataset registration in the UI.

consumer_train_path = 'consumer_train'
consumer_test_path = 'consumer_test'
acme_data = 'acme_train' 

Create and start the training session

Federated learning models created in integrate.ai are trained through sessions. You define the parameters required to train a federated model, including data and model configurations, in a session.

Create a new session each time you want to train a new model.

The code sample demonstrates creating and starting a session with two training datasets (specified as min_num_clients) and two rounds (num_rounds). It returns a session ID that you can use to track and reference your session.

The package_name specifies the federated learning model package - in the example, it is iai_ffnet however, other packages are supported. See Model packages for more information.

#Create the task builder
from integrate_ai_sdk.taskgroup.taskbuilder.integrate_ai import IntegrateAiTaskBuilder
from integrate_ai_sdk.taskgroup.base import SessionTaskGroup

iai_tb_aws = IntegrateAiTaskBuilder(client=client,task_runner_id="")

#Create the session
hfl_session = client.create_fl_session(
    name="Testing notebook",
    description="I am testing session creation through a notebook",
    min_num_clients=2,
    num_rounds=2,
    package_name="iai_ffnet",
    model_config=model_config,
    data_config=data_config,
).start()

hfl_session.id

Join clients to the session

The next step is to join the session with the sample data. This example has data for two datasets simulating two clients, as specified with the min_num_clients argument. Therefore, to run this example, you add two client tasks to the taskbuilder.

The session begins training after the minimum number of clients have joined the session.

task_group = (
    SessionTaskGroup(hfl_session)
    .add_task(iai_tb_aws.hfl(train_path=train_path1, test_path=test_path1))\
    .add_task(iai_tb_aws.hfl(train_path=train_path2, test_path=test_path2))
)
task_group_context = task_group.start()

Wait for session results

Depending on the type of session and the size of the datasets, sessions may take some time to run. You can poll the server to determine the session status, or wait for the session status to change to “Completed” in the UI.

HFL FFNET Session Metrics

Congratulations, you have your first federated model! You can test it by retrieving the metrics and making predictions.

To retrieve the session metrics:

htl_session.metrics().as_dict()

To plot the session metrics:

fig = htl_session.metrics().plot()

Back to HFL model types