HFL-GLM Model Training (iai_glm)

This example demonstrates training a generalized linear model (GLM) using horizontal federated learning (HFL).

Before you get started, make sure that you have completed the environment setup and registered your datasets.

Specify dataset names

Use variables to specify the names of your registered datasets. The names must match the names given during dataset registration in the UI.

consumer_train_name = 'consumer_train'
consumer_test_name = 'consumer_test'

Model and data configs for HFL-GLM

Specify the model configuration for the generalized linear model.

model_config = {
    "strategy": {"name": "FedIRLS", "params": {}},
     "model": {
        "params": {
            "input_size": len(consumer_features), 
            "output_activation": 'exp'
                   }
            },
   "ml_task": {
        "type": "regression",
        "loss_function": "poisson",
        "params": {},
    },
    "seed": 23,  # for reproducibility
}

The model configuration (model_config) is a JSON object that contains the model parameters for the session.

  • strategy - Select one of the available federated learning strategies from the strategy library.

  • model - Defines the specific parameters required for the model type.

    • input_size

    • output_activation should be set as the “inverse link function“ for GLM. For example, sigmoid for the logit link, and exp for the log link. Currently supported values include sigmoid, exp, tanh.

  • ml_task - Defines the federated learning task and associated parameters.

    • type - Choose between classification or regression.

    • loss_function - The following values are supported: logistic, mse, poisson, gamma, tweedie, and inverseGaussian. Tweedie has an additional power parameter to control the underlying target distribution. Choose the loss_function based on the type of the target variable. For example, logistic for binary target and poisson for counts.

Data configuration

The data configuration is a JSON object where the user specifies predictor and target columns that are used to describe input data. This is the same structure for both GLM and FNN.

Example data config:

data_config = {
    "predictors": consumer_features,
    "target": target,
    "sample_weight": exposure,
}

About sample_weight

By default, each sample has an equal weight when computing the loss and other subsequent quantities (such as the gradients). However, different sample weights may allow information about the known credibility of each observation to be incorporated in the model.

For example, if modeling claims frequency, one observation might relate to one month’s exposure, and another to one year’s exposure. There is more information and less variability in the observation relating to the longer exposure period, and this can be incorporated in the model by defining the sample weight to be the exposure of each observation. In this way observations with higher exposure are deemed to have lower variance, and the model will consequently be more influenced by these observations.

To use sample weights, you must add the paramater sample_weight to the data configuration.

Once you have created or updated the model config and data config, the next step is to create a training session to begin working with the model and datasets.

Create and start the training session

#Create the task builder
from integrate_ai_sdk.taskgroup.taskbuilder.integrate_ai import IntegrateAiTaskBuilder
from integrate_ai_sdk.taskgroup.base import SessionTaskGroup

iai_tb_aws = IntegrateAiTaskBuilder(client=client,task_runner_id="")

#Create the session
glm_session = client.create_fl_session(
    name="Testing HFL GLM session",
    description="I am testing FedIRLS session creation through a notebook",
    min_num_clients=1,
    num_rounds=2,
    package_name="iai_glm",
    model_config=model_config,
    data_config=data_config,
).start()

glm_session.id

Federated learning models created in integrate.ai are trained through sessions. You define the parameters required to train a federated model, including data and model configurations, in a session. Create a session each time you want to train a new model.

The code sample demonstrates creating and starting a session with one training dataset (specified as min_num_clients) and two rounds (num_rounds). It returns a session ID that you can use to track and reference your session.

The package_name specifies the federated learning model package - in the example, it is iai_glm however, other packages are supported. See Model packages for more information.

Create and start a task group

fl_task_group_context = (SessionTaskGroup(glm_session)\
    .add_task(iai_tb_aws_consumer.hfl(train_dataset_name=consumer_train_name,
                                      test_dataset_name=consumer_test_name,
                                      batch_size=baseline_batch_size))
    .start())

glm_session.id        #Prints the session ID for reference

Wait for session results

Depending on the type of session and the size of the datasets, sessions may take some time to run. You can poll the server to determine the session status, or wait for the session status to change to “Completed” in the UI.

When the session is complete, you can test it by making predictions. For more information, see Making Predictions.

Back to HFL model overview