HFL FFNet Model Training with a Sample Local Dataset (iai_ffnet)

To help you get started with federated learning in the integrate.ai system, we’ve provided a tutorial based on synthetic data with pre-built configuration files that you can run using a task runner. In this example, you will train a federated feedforward neural network (iai_ffn) using data from two datasets. The datasets, model, and data configuration are provided for you.

The sample notebook (integrateai_HFL.ipynb) contains runnable code snippets for exploring the SDK and should be used in parallel with this example. This documentation provides supplementary and conceptual information to expand on the code demonstration.

Tip: make sure you have installed the SDK and set up a task runner.

Open the integrateai_HFL.ipynb notebook to test the code as you walk through this exercise.

Understanding Models

integrate.ai has several standard model classes available, including:

  • Feedforward Neural Nets (iai_ffn) - uses the same activation for each hidden layer.

  • Generalized Linear Models (iai_glm) - uses a linear feedforward layer.

  • Gradient Boosted Models (iai_gbm) - uses the sklearn implementation of HistGradientBoostingModels.

  • Linear Inference Models (iai_linear_inference) - performs statistical inference on model coefficients for linear and logistic regression.

  • Principal Component Analysis (iai_pca) - performs multivariate linear transformation which calculates the principal components based on results from the principal component analysis.

Model configuration

These standard models are defined using JSON configuration files during session creation. The model configuration (model_config) is a JSON object that contains the model parameters for the session.

model_config = {
    "experiment_name": "test_synthetic_tabular",
    "experiment_description": "test_synthetic_tabular",
    "strategy": {     
        "name": "FedAvg",       // Name of the federated learning strategy
        "params": {}
        },
    "model": {                  // Parameters specific to the model type 
        "params": {
            "input_size": 15, 
            "hidden_layer_sizes": [6, 6, 6], 
            "output_size": 2
                   }
            },
    "balance_train_datasets": False, // Performs undersampling on the dataset
    "ml_task": {                    // Specifies the federated learning strategy
        "type": "classification",
        "params": {
            "loss_weights": None,  
        },
    },
    "optimizer": {
        "name": "SGD",              // Name of the PyTorch optimizer used 
        "params": {
            "learning_rate": 0.2,
            "momentum": 0.0}
            },
    "differential_privacy_params": {    // Defines the differential privacy parameters
        "epsilon": 4, 
        "max_grad_norm": 7
        },
    "save_best_model": {
        "metric": "loss",           // To disable this and save the model from the last round, set to None
        "mode": "min",
    },
}

There are five main properties with specific key-value pairs used to configure the model:

  • strategy - Select one of the available federated learning strategies from the strategy library.

  • model - Defines the specific parameters required for the model type.

  • ml-task - Defines the federated learning strategy and associated parameters.

  • optimizer - Defines the parameters for the PyTorch optimizer.

  • differential_privacy_params - Defines the differential privacy parameters. See Differential Privacy for more information.

The example in the notebook is a model provided by integrate.ai. For this tutorial, you do not need to change any of the values.

Data configuration

The data configuration is a JSON object where the user specifies predictor and target columns that are used to describe input data. This is the same structure for both GLM and FNN.

data_config = {
    "predictors": ["x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10", "x11", "x12", "x13", "x14"],
    "target": "y",
}

Once you have created or updated the model and data configurations, the next step is to create a training session to begin working with the model and datasets.

Specify paths to datasets

aws_taskrunner_profile = "<workspace name>" # This is your workspace name. For example, if your login URL is `https://login-abc.integrateai.net/`, then your workspace name is `abc`.
aws_taskrunner_name = "<taskrunner>" # Task runner name - must match what was supplied in UI to create task runner. Example: `abcrunner`

#By default, an S3 bucket is created when you create a new task runner. This example assumes you have uploaded the files to the default bucket.
base_aws_bucket = f'{aws_taskrunner_profile}-{aws_taskrunner_name}.integrate.ai'

# Example datapaths. Make sure that the data you want to work with exists in the base_aws_bucket for your task runner.
# HFL datapaths
train_path1 = f's3://{base_aws_bucket}/synthetic/train_silo0.parquet'
test_path1 = f's3://{base_aws_bucket}/synthetic/test.parquet'
train_path2 = f's3://{base_aws_bucket}/synthetic/train_silo1.parquet'
test_path2 = f's3://{base_aws_bucket}/synthetic/test.parquet'

Create and start the training session

Federated learning models created in integrate.ai are trained through sessions. You define the parameters required to train a federated model, including data and model configurations, in a session.

Create a new session each time you want to train a new model.

The code sample demonstrates creating and starting a session with two training datasets (specified as min_num_clients) and two rounds (num_rounds). It returns a session ID that you can use to track and reference your session.

The package_name specifies the federated learning model package - in the example, it is iai_ffnet however, other packages are supported. See Model packages for more information.

#Create the task builder
from integrate_ai_sdk.taskgroup.taskbuilder.integrate_ai import IntegrateAiTaskBuilder
from integrate_ai_sdk.taskgroup.base import SessionTaskGroup

iai_tb_aws = IntegrateAiTaskBuilder(client=client,
   task_runner_id=aws_taskrunner_name)

#Create the session
hfl_session = client.create_fl_session(
    name="Testing notebook",
    description="I am testing session creation through a notebook",
    min_num_clients=2,
    num_rounds=2,
    package_name="iai_ffnet",
    model_config=model_config,
    data_config=data_config,
).start()

hfl_session.id

Join clients to the session

The next step is to join the session with the sample data. This example has data for two datasets simulating two clients, as specified with the min_num_clients argument. Therefore, to run this example, you add two client tasks to the taskbuilder.

The session begins training after the minimum number of clients have joined the session.

task_group = (
    SessionTaskGroup(hfl_session)
    .add_task(iai_tb_aws.hfl(train_path=train_path1, test_path=test_path1))\
    .add_task(iai_tb_aws.hfl(train_path=train_path2, test_path=test_path2))
)
task_group_context = task_group.start()

Poll for session results

Depending on the type of session and the size of the datasets, sessions may take some time to run. In the sample notebook and this tutorial, we poll the server to determine the session status.

You can log information about the session during this time. In this example, we are logging the current round and the clients that have joined the session.

# Monitor the submitted tasks

for i in task_group_context.contexts.values():
    print(json.dumps(i.status(), indent=4))

task_group_context.monitor_task_logs()
# Wait for the tasks to complete (success = True)

task_group_context.wait(60*5, 2)

Another popular option is to log the session.metrics().as_dict() to view the in-progress training metrics.

HFL FFNET Session Metrics

Congratulations, you have your first federated model! You can test it by retrieving the metrics and making predictions.

To retrieve the session metrics:

htl_session.metrics().as_dict()

To plot the session metrics:

fig = htl_session.metrics().plot()

Back to HFL model types