Model Packages

Feed Forward Neural Nets (iai_ffnet)

Feedforward neural nets are a type of neural network in which information flows the nodes in a single direction.

Examples of use cases include:

  • Classification tasks like image recognition or churn conversion prediction.

  • Regression tasks like forecasting revenues and expenses, or determining the relationship between drug dosage and blood pressure

HFL FFNet

The iai_ffnet model is a feedforward neural network for horizontal federated learning (HFL) that uses the same activation for each hidden layer.

This model only supports classification and regression. Custom loss functions are not supported.

Privacy

DP-SGD (differentially private stochastic gradient descent) is applied as an additional privacy-enhancing technology. The basic idea of this approach is to modify the gradients used in stochastic gradient descent (SGD), which lies at the core of almost all deep learning algorithms.

VFL SplitNN

integrate.ai also supports the SplitNN model for vertical federated learning (VFL). In this model, neural networks are trained with data across multiple clients. A PRL (private-record linking) session is required for all datasets involved. There are two types of sessions: train, and predict. To make predictions, the PRL session ID and the corresponding training session ID are required.

For more information, see PRL Session and VFL SplitNN Model Training.

Generalized Linear Models (GLMs)

This model class supports a variety of regression models. Examples include linear regression, logistic regression, Poisson regression, Gamma regression, Tweedie regression, and inverse Gaussian regression models. We also support regularizing the model coefficients with the elastic net penalty.

Examples of use cases include [1]:

  • Agriculture / weather modeling: number of rain events per year, amount of rainfall per event, total rainfall per year

  • Risk modeling / insurance policy pricing: number of claim events / policyholder per year, cost per event, total cost per policyholder per year

  • Predictive maintenance: number of production interruption events per year, duration of interruption, total interruption time per year

The iai_glm model trains generalized linear models by treating them as a special case of single-layer neural nets with particular output activation functions.

Privacy

DP-SGD (differentially private stochastic gradient descent) is applied as an additional privacy-enhancing technology. The basic idea of this approach is to modify the gradients used in stochastic gradient descent (SGD), which lies at the core of almost all deep learning algorithms.

References [1]: https://scikit-learn.org/stable/modules/linear_model.html#generalized-linear-regression

Gradient Boosted Models (GBMs)

Gradient boosting is a machine learning algorithm for building predictive models that helps minimize the bias error of the model. The gradient boosting model provided by integrate.ai uses the sklearn implementation of HistGradientBoostingClassifier for classifier tasks and HistGradientBoostingRegresssor for regression tasks.

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a dimensionality reduction technique that helps rule out features that are not important to the model. In other words, it generates a transformation to apply on a dataset to simplify the number of features used while retaining enough information in the features to represent the original dataset.

The main use of PCA is to help researchers uncover the internal pattern underlying the data. This pattern is then helpful mainly in two applications:

  1. Helps discover new groups and clusters among patients

  2. More efficient modelling: achieving the same or even better predictive power with less input and simpler models.

The iai_pca session outputs a transformation matrix that is generated in a distributed manner. This transformation can be applied to each dataset of interest.

The underlying technology of this session type is FedSVD (federated singular value decomposition), which can be used in other dimensionality reduction methods.