HFL-GLM Model Training (iai_glm)¶
This example demonstrates training a generalized linear model (GLM) using horizontal federated learning (HFL).
Before you get started, make sure that you have completed the environment setup and registered your datasets.
Specify dataset names
Use variables to specify the names of your registered datasets. The names must match the names given during dataset registration in the UI.
consumer_train_name = 'consumer_train'
consumer_test_name = 'consumer_test'
Model and data configs for HFL-GLM¶
Specify the model configuration for the generalized linear model.
model_config = {
"strategy": {"name": "FedIRLS", "params": {}},
"model": {
"params": {
"input_size": len(consumer_features),
"output_activation": 'exp'
}
},
"ml_task": {
"type": "regression",
"loss_function": "poisson",
"params": {},
},
"seed": 23, # for reproducibility
}
The model configuration (model_config
) is a JSON object that contains the model parameters for the session.
strategy
- Select one of the available federated learning strategies from the strategy library.model
- Defines the specific parameters required for the model type.input_size
output_activation
should be set as the “inverse link function“ for GLM. For example,sigmoid
for thelogit
link, andexp
for thelog
link. Currently supported values includesigmoid, exp, tanh
.
ml_task
- Defines the federated learning task and associated parameters.type
- Choose betweenclassification
orregression
.loss_function
- The following values are supported:logistic
,mse
,poisson
,gamma
,tweedie
, andinverseGaussian
. Tweedie has an additionalpower
parameter to control the underlying target distribution. Choose theloss_function
based on the type of the target variable. For example,logistic
for binary target andpoisson
for counts.
Data configuration¶
The data configuration is a JSON object where the user specifies predictor and target columns that are used to describe input data. This is the same structure for both GLM and FNN.
Example data config:
data_config = {
"predictors": consumer_features,
"target": target,
"sample_weight": exposure,
}
About sample_weight
By default, each sample has an equal weight when computing the loss and other subsequent quantities (such as the gradients). However, different sample weights may allow information about the known credibility of each observation to be incorporated in the model.
For example, if modeling claims frequency, one observation might relate to one month’s exposure, and another to one year’s exposure. There is more information and less variability in the observation relating to the longer exposure period, and this can be incorporated in the model by defining the sample weight to be the exposure of each observation. In this way observations with higher exposure are deemed to have lower variance, and the model will consequently be more influenced by these observations.
To use sample weights, you must add the paramater sample_weight
to the data configuration.
Once you have created or updated the model config and data config, the next step is to create a training session to begin working with the model and datasets.
Create and start the training session¶
#Create the task builder
from integrate_ai_sdk.taskgroup.taskbuilder.integrate_ai import IntegrateAiTaskBuilder
from integrate_ai_sdk.taskgroup.base import SessionTaskGroup
iai_tb_aws = IntegrateAiTaskBuilder(client=client,task_runner_id="")
#Create the session
glm_session = client.create_fl_session(
name="Testing HFL GLM session",
description="I am testing FedIRLS session creation through a notebook",
min_num_clients=1,
num_rounds=2,
package_name="iai_glm",
model_config=model_config,
data_config=data_config,
).start()
glm_session.id
Federated learning models created in integrate.ai are trained through sessions. You define the parameters required to train a federated model, including data and model configurations, in a session. Create a session each time you want to train a new model.
The code sample demonstrates creating and starting a session with one training dataset (specified as min_num_clients
) and two rounds (num_rounds
). It returns a session ID that you can use to track and reference your session.
The package_name
specifies the federated learning model package - in the example, it is iai_glm
however, other packages are supported. See Model packages for more information.
Create and start a task group
fl_task_group_context = (SessionTaskGroup(glm_session)\
.add_task(iai_tb_aws_consumer.hfl(train_dataset_name=consumer_train_name,
test_dataset_name=consumer_test_name,
batch_size=baseline_batch_size))
.start())
glm_session.id #Prints the session ID for reference
Wait for session results
Depending on the type of session and the size of the datasets, sessions may take some time to run. You can poll the server to determine the session status, or wait for the session status to change to “Completed” in the UI.
When the session is complete, you can test it by making predictions. For more information, see Making Predictions.