Sample Weighting

Sample Weighting

By default, each sample has an equal weight when computing the loss and other subsequent quantities (such as the gradients). However, different sample weights may allow information about the known credibility of each observation to be incorporated in the model.

For example, if modeling claims frequency, one observation might relate to one month’s exposure, and another to one year’s exposure. There is more information and less variability in the observation relating to the longer exposure period, and this can be incorporated in the model by defining the sample weight to be the exposure of each observation. In this way observations with higher exposure are deemed to have lower variance, and the model will consequently be more influenced by these observations.

To use sample weights, you must add the paramater sample_weight to the data configuration.

In the code examples provided, the sample weights column is named exposure.

Sample weight usage

# HFL sample weight example

data_config = {
    "predictors": ...,
    "target": ...,
    "sample_weight": "exposure"
}
# VFL sample weight example

data_config = {
    "active": {
        "predictors": ...,
        "target": ...,
        "sample_weight": "exposure"
    },
    "passive": {
        ...
    }
}

Sample weights are required to be strictly positive. No other preprocessing is necessary. In both HFL and VFL, the weights will be automatically normalised to sum to 1 within each batch, to ensure consistent scaling of the loss across batches.

For VFL, the sample weights can only be provided by the active party because they are the ones computing the loss and other byproducts (e.g., gradients).