Release Notes¶
08 February 2025: Release 9.17.0¶
This release introduces the following new features and improvements:
Allow customer to use an existing Azure virtual network
Allow customer to specify an arbitrary storage path for Azure task runners
Allow customer to use an Azure task runner with a different Azure subscription, in EU region
Allow customer to use a custom Azure provisioner role instead of the default Contributor role
Enable Private link functionality for Azure
Allow customer to configure an ACR for use with their Azure task runner
PCA scalability improvements (using dask backend)
Allow customer to add prior sample weights for GLM and FFNet
Added a test script for AWS task runners that can be run from the command line. Details are available here.
Updated the metrics plot in the SDK to adaptively adjust the number of rows based on the number of metrics to show
Enabled the display of parquet metadata in client log output in the UI for debugging
Added UI visualization to show the underlying task infrastructure information, which was previously only available through the AWS Console
Updating from earlier versions
Required: Update your SDK to the latest version.
Resolved Issues
Fixed an issue where the Azure client failed with no logs in UI
Passive non-intersection are now only considered when
hide_intersection=True
Fixed the dataset plot of EDA in the SDK
Updated the description of attachment functionality
Fixed rate limiting for flask limitation
Fixed an issue where the server exited with code 0 while the log showed failures
18 December 2024: Release 9.16.0¶
This release introduces the following new features and improvements:
UI design updates for improved usability.
Added the ability to see all task runner deployment logs in the UI.
Added the ability to generate single client 2D EDA histograms (both columns in same dataset) as well as cross-client 2D EDA histograms.
Verified support for Azure private links.
Added OIDC-based SSO IDP integration. Contact your integrate.ai Customer Support Engineer for more information.
Updating from earlier versions
Required: Update your SDK to the latest version.
Known Issues
Sessions run with a VFL data config that contains predictors set to
[]
for the active client may fail if one or more columns are of incompatible data type (for example: id column which is ofstring
data type). The predictor supports onlyfloat64
,float32
,float16
,complex64
,complex128
,int64
,int32
,int16
,int8
,uint8
, andbool
.VFL-GBM prediction results are not stored correctly and therefore metrics cannot be retrieved for these sessions on AWS and Azure.
Test data is optional for PRL, but this needs to be consistent across all the clients. That is, if you do not specify the test data for one client, then do not specify it for any clients. Likewise, if you provide the test data for one client, you must provide it for each client joining the session.
Storage paths for task runners: only the default S3 bucket and other buckets ending in
*integrate.ai
are supported as storage paths for task runners. If you are not using the default bucket created by task runner when it was provisioned, ensure that your data is hosted in an S3 bucket with a URL ending in*integrate.ai
. Otherwise, the data will not be accessible to the task runner.The bucket must also be located in the same AWS region as the task runner.
Starting many sessions simultaneously may trigger a throttling exception from AWS APIs. The error appears as: “An error occurred (ThrottlingException) when calling…” in the logs.
Resolved Issues
On the Members > Admin page, not all users listed are Administrators. The listed user roles are correct.
Corrected several error messages for improved usability.
21 Oct 2024: Release 9.15.0¶
This release introduces the following new features and improvements:
Added support for AWS UK regions.
Added the ability to configure an AWS task runner to use a specific ECR. Contact Customer Success for details.
Added the ability to run Azure tasks in an Azure Private Network. See the documentation for details.
Added the ability for user to specify their own provisioner and runtime principal for Azure task runners.
Improved the user experience with creating an Azure task runner by minimizing the task runner permission requirements.
Added the option to install a task runner on-premises in a VM or dedicated server.
Improved log readability for ease of use.
Improved the graph legibility in the UI and corrected some metric displays.
The task type is now indicated in the task selector drop-down on the session details page.
Removed the requirement for user-specified timeouts. Timeout values are now managed by the system.
Updated the workspace homepage.
Updated the UI for the Register task runner workflow.
Migration from earlier versions
Required: Update the policies for the iai-taskrunner-provisioner and iai-taskrunner-runtime. Follow the AWS configuration for task runners documentation.
Required: You must specify the provisioner and runtime principals when creating an Azure task runner. See the documentation for details.
Recommended: Upgrading your SDK is recommended but not mandatory with this release.
Known Issues
Sessions run with a VFL data config that contains predictors set to
[]
for the active client may fail if one or more columns are of incompatible data type (for example: id column which is ofstring
data type). The predictor supports onlyfloat64
,float32
,float16
,complex64
,complex128
,int64
,int32
,int16
,int8
,uint8
, andbool
.VFL-GBM prediction results are not stored correctly and therefore metrics cannot be retrieved for these sessions on AWS and Azure.
Test data is optional for PRL, but this needs to be consistent across all the clients. That is, if you do not specify the test data for one client, then do not specify it for any clients. Likewise, if you provide the test data for one client, you must provide it for each client joining the session.
Storage paths for task runners: only the default S3 bucket and other buckets ending in *integrate.ai are supported as storage paths for task runners. If you are not using the default bucket created by task runner when it was provisioned, ensure that your data is hosted in an S3 bucket with a URL ending in
*integrate.ai
. Otherwise, the data will not be accessible to the task runner.The bucket must also be located in the same AWS region as the task runner.
Starting many sessions simultaneously may trigger a throttling exception from AWS APIs. The error appears as: “An error occurred (ThrottlingException) when calling…” in the logs.
On the Members > Admin page, not all users listed are Administrators. The listed user roles are correct.
Resolved Issues
Fixed the issue causing session timestamps in the UI to not adjust correctly for user timezones.
Fixed an issue that caused 1D EDA histograms to not display correctly.
Fixed an issue that allowed a user to set invalid parameters for vCPU and memory when creating an Azure task runner.
Fixed an issue that caused client tasks to be marked as succeeded when the task had an error.
Fixed an issue with the session metrics showing in the wrong format in the session details.
Corrected the permission descriptions for user roles. Note that only Administrators can see these descriptions.
Fixed an issue that caused task runner creation to fail when the AWS region
us-west-1
was selected.Added support for handling data with duplicated IDs in PRL sessions. See the documentation for details.
10 Sept 2024: Release 9.14.0¶
This release introduces the following new features and improvements:
Dataset connections are now synchronized. Any change you make to the dataset or the connected task runner, is automatically reflected in the partner’s workspace.
HFL Scalability improvements.
2D EDA histograms are now supported at scale on Azure.
The Tweedie Distribution is now supported for both HFL and VFL GLM sessions.
The SDK is now available as a wheel (.whl) package.
Server logs are now available in the workspace UI for all sessions.
Added support for West and North Europe regions for Azure task runners.
The start time for sessions is now displayed on the Session Details page in the UI.
The name of the task runner used for a task is now displayed with the detailed session log information.
By default, all sessions except VFL/HFL GBM now use dask as the backend instead of pandas to improve scalability.
Migration from earlier versions
Update your SDK
Update the custom trust policy for the iai-taskrunner-provisioner. Follow the AWS configuration for task runners documentation.
Known Issues
If an existing VPC is required, additional information must be provided to integrate.ai for use with task runners. Contact Customer Success for more information.
Sessions run with a VFL data config that contains predictors set to
[]
for the active client may fail if one or more columns are of incompatible data type (for example: id column which is ofstring
data type). The predictor supports onlyfloat64
,float32
,float16
,complex64
,complex128
,int64
,int32
,int16
,int8
,uint8
, andbool
.VFL-GBM prediction results are not stored correctly and therefore metrics cannot be retrieved for these sessions on AWS and Azure.
For PRL sessions, exact matching is enabled by default. Therefore, if there are duplicate IDs in the dataset, the session will fail with an error message. Workaround: if you require fuzzy matching, set the
match_threshold=0.99
when creating the session. This forces the use of fuzzy matching and gives effectively the same result as the exact matching. Note that exact matching currently requires approximately 4x less time to complete.Test data is optional for PRL, but this needs to be consistent across all the clients. That is, if you do not specify the test data for one client, then do not specify it for any clients. Likewise, if you provide the test data for one client, you must provide it for each client joining the session.
Storage paths for task runners: only the default S3 bucket and other buckets ending in *integrate.ai are supported as storage paths for task runners. If you are not using the default bucket created by task runner when it was provisioned, ensure that your data is hosted in an S3 bucket with a URL ending in
*integrate.ai
. Otherwise, the data will not be accessible to the task runner.The bucket must also be located in the same AWS region as the task runner.
Starting many sessions simultaneously may trigger a throttling exception from AWS APIs. The error appears as: An error occurred (ThrottlingException) when calling… in the logs.
Session timestamps in the UI do not adjust correctly for user timezones. The timestamp in the database in UTC is correct.
On the Members > Admin page, not all users listed are Administrators. The listed user roles are correct.
Resolved Issues
Fixed the issue that was causing VFL with empty predictors (None) to fail
Fixed an issue where charts for VFL-GLM sessions running regression models showed num_examples as default instead of loss.
Fixed an issue with 2D histograms for EDA Intersect.
Fixed an issue that was preventing the use of client names with special characters in PRL sessions.
Fixed an issue with metrics plots for VFL training sessions.
Fixed an issue that was causing Azure logs to be truncated in the UI.
Activation emails have been updated to a new standardized layout.
14 August 2024: Release 9.13.0¶
This release introduces the following new features:
User is now able to invite a partner to evaluate their dataset using the Dataset Connection feature.
Datasets stored in either AWS or Azure are supported
Any attachments on the dataset are shared
Connections are not currently synchronized. If a change is made by the owner, it is not propagated to the partner.
The Feature Importance calculation is now enabled by default.
Scalability improvements
PRL execution time reduced
1D EDA Intersect memory usage and execution time reduced
VFL-GLM memory consumption more efficient by a factor of 4x and execution time reduced by a factor of 30x
2D EDA Intersect now supports large datasets
Migration from earlier versions
Required changes:
- Update your SDK. This is a required update for this release due to breaking changes in the SDK.
- Workspaces must be updated to support connected datasets.
- If you did not update your AWS configuration before or after the 9.12.0 release, follow the [AWS configuration for task runners documentation](/deployment.md#aws-configuration-for-task-runners) to re-create the `iai-taskrunner-provisioner` role and `iai-taskrunner-runtime` role.
- Update the custom trust policy for the iai-taskrunner-role.
- Attach any required permission boundaries to the roles.
- The documentation describes how to create the policies, custom trust policies, and roles required for integrate.ai task runners to run in your AWS environment.
- If you are required to use an existing VPC are AMI, additional information must be provided to integrate.ai for use with task runners. Contact Customer Support for details.
Known Issues
Sessions run with a VFL data config that contains predictors set to
[]
for the active client may fail if one or more columns are of incompatible data type (for example: id column which is ofstring
data type). The predictor supports onlyfloat64
,float32
,float16
,complex64
,complex128
,int64
,int32
,int16
,int8
,uint8
, andbool
.VFL-GBM prediction results are not stored correctly and therefore metrics cannot be retrieved for these sessions on AWS and Azure.
For PRL sessions, exact matching is enabled by default. Therefore, if there are duplicate IDs in the dataset, the session will fail with an error message. Workaround: if you require fuzzy matching, set the
match_threshold=0.99
when creating the session. This forces the use of fuzzy matching and gives effectively the same result as the exact matching. Note that exact matching currently requires approximately 4x less time to complete.For PRL sessions, row_number is a reserved key in the dask backend. Therefore if
"id_columns": ["row_number"]
is used in a PRL session, the session fails.Test data is optional for PRL, but this needs to be consistent across all the clients. That is, if you do not specify the test data for one client, then do not specify it for any clients. Likewise, if you provide the test data for one client, you must provide it for each client joining the session.
Storage paths for task runners: only the default S3 bucket and other buckets ending in *integrate.ai are supported as storage paths for task runners. If you are not using the default bucket created by task runner when it was provisioned, ensure that your data is hosted in an S3 bucket with a URL ending in
*integrate.ai
. Otherwise, the data will not be accessible to the task runner.The bucket must also be located in the same AWS region as the task runner.
Proxy task runner mode has been disabled in this release. This means that task runners are not proxied through a load balancer and will therefore not have a static IP address.
Starting many sessions simultaneously may trigger a throttling exception from AWS APIs. The error appears as: An error occurred (ThrottlingException) when calling… in the logs.
Session timestamps in the UI do not adjust correctly for user timezones. The timestamp in the database in UTC is correct.
On the Members > Admin page, not all users listed are Administrators. The listed user roles are correct.
11 July 2024: Release 9.12.0¶
This release introduces the following new features:
User is now able to invite a partner to evaluate their dataset using the Dataset Connection feature.
Currently only datasets stored in AWS is available for Dataset Connection
Attachments are currently not shared
Overall improvement on session and model training to handle larger datasets
Can support PRL sessions with 20M rows and 35% overlap with 40 features
PRL and VFL in Azure is only supported for small datasets (only with pandas)
Migration from earlier versions
Required changes:
- Update your SDK. This is a required update for this release due to breaking changes in the SDK.
- Update the custom trust policy for the iai-taskrunner-role.
- If you did not update your AWS configuration before or after the 9.11.0 release, follow the [AWS configuration for task runners documentation](/deployment.md#aws-configuration-for-task-runners) to re-create the iai-taskrunner-provisioner role and iai-taskrunner-runtime role.
- Attach any required permission boundaries to the roles
- The documentation describes how to create the policies, custom trust policies, and roles required for integrate.ai task runners to run in your AWS environment.
- As in the previous release, if an existing VPC is required, additional information must be provided to integrate.ai for use with task runners.
Notice of Upcoming Changes:
The ability to specify VPC information through the UI will be added in a future release.
Known issues
For PRL sessions, exact matching is enabled by default. Therefore, if there are duplicate IDs in the dataset, the session will fail with the error message:
CLKs must be unique for exact matching.
Workaround: if you require fuzzy matching, set thematch_threshold=0.99
when creating the session. This forces the use of fuzzy matching and gives effectively the same result as the exact matching. Note that exact matching currently requires approximately 4x less time to complete.For PRL sessions, row_number is a reserved key in the dask backend. Therefore if “id_columns”: [“row_number”] is used in a PRL session, the session fails.
EDA in Intersect mode: The hide_intersection functionality has been disabled in this release. The hide_intersection option has been set to False by default.
Storage paths for task runners: only the default S3 bucket and other buckets ending in *integrate.ai are supported as storage paths for task runners. If you are not using the default bucket created by task runner when it was provisioned, ensure that your data is hosted in an S3 bucket with a URL ending in
*integrate.ai
. Otherwise, the data will not be accessible to the task runner.VFL GBM prediction results are not stored correctly and therefore metrics cannot be retrieved for these sessions.
On the Members > Admin page, not all users listed are Administrators. The listed user roles are correct.
Proxy task runner mode has been disabled in this release. This means that task runners are not proxied through a load balancer and will therefore not have a static IP address.
Starting many sessions simultaneously may trigger a throttling exception from AWS APIs. The error appears as: An error occurred (ThrottlingException) when calling… in the logs.
Custom model package names may not contain uppercase letters or special characters other than underscore.
Test data is optional for PRL, but this needs to be consistent across all the clients. That is, if you don’t specify the test data for one client, then it shouldn’t specified for any clients. Likewise, if you provide the test data for one client, you need to provide it for each client joining the session.
Session timestamps in the UI do not adjust correctly for user timezones. The timestamp in the database in UTC is correct.
14 June 2024: Release 9.11.0¶
This release introduces the following new features:
Revised task runner provisioner and runtime roles/policies for use in enterprise environments.
Ability to use an existing customer VPC when necessary.
Ability to attach and download python notebooks (*.ipynb) to datasets.
Migration from earlier versions
Required changes:
- Follow the [AWS configuration for task runners documentation](/deployment.md#aws-configuration-for-task-runners) to re-create the iai-taskrunner-provisioner role and iai-taskrunner-runtime role.
- Attach any required permission boundaries to the roles
- The documentation describes how to create the policies, custom trust policies, and roles required for integrate.ai task runners to run in your AWS environment.
- With this release, if an existing VPC is required, additional information must be provided to integrate.ai for use with task runners.
The following information is required:
<ul>
<li>VPC ID</li>
<li>Public subnet IDs</li>
<li>Private subnet IDs</li>
<li>Name of the taskrunner to create</li>
<li>AWS Region for the task runner</li>
<li>Required memory and vCPU values</li>
</li>
Notice of Upcoming Changes:
Release 9.12.0 will also include an update to the custom trust policy for the iai-taskrunner-role. Customers will be required to update the role in their AWS account.
The ability to specify VPC information through the UI will be added in a future release.
Known issues:
EDA in Intersect mode: The
hide_intersection
functionality has been disabled in this release. Thehide_intersection
option has been set toFalse
by default. Contact your integrate.ai CSE for more information.For PRL sessions,
row_number
is a reserved key in thedask
backend. Therefore if"id_columns": ["row_number"]
is used in a PRL session, the session fails.Storage paths for task runners: only the default S3 bucket and other buckets ending in
*integrate.ai
are supported as storage paths for task runners. If you are not using the default bucket created by task runner when it was provisioned, then ensure that your data is hosted in an S3 bucket with a URL ending in*integrate.ai
. Otherwise, the data will not be accessible to the task runner.In some instances, task logs may not appear in the UI. The logs do exist, the issue is that they are not displayed.
VFL GBM prediction results are not stored correctly and therefore metrics cannot be retrieved for these sessions.
On the Members > Admin page, not all users listed are Administrators. The listed user roles are correct.
Proxy task runner mode has been disabled in this release. This means that task runners are not proxied through a load balancer and will therefore not have a static IP address. The documentation for how to whitelist a task runner has been removed.
Starting many sessions simultaneously may trigger a throttling exception from AWS APIs. The error appears as:
An error occurred (ThrottlingException) when calling...
in the logs.Custom model package names may not contain uppercase letters or special characters other than underscore.
Test data is optional for PRL, but this needs to be consistent across all the clients. That is, if you don’t specify the test data for one client, then it shouldn’t specified for any clients. Likewise, if you provide the test data for one client, you need to provide it for each client joining the session.
Session timestamps in the UI do not adjust correctly for user timezones. The timestamp in the database in UTC is correct.
Fixes:
With certain datasets, the feature importance evaluation failed due to a casting error.
In some instances, task logs did not appear in the UI. The logs do exist, the issue is that they are not displayed.
Dataset descriptions did not maintain new lines.
Error logs with request IDs are not showing up in Cloudwatch/elastic search
15 May 2024: Release 9.10.0¶
This release introduces the following new features:
Ability to switch to a
dask
backend for improved speed when processing large datasets in PRL sessions (tested to up to 13M rows). A description of how to enable this mode and what configuration settings are available is provided in the documentation here.You can now use a registered dataset name in any of the following session types instead of specifying file name and path:
EDA
PRL
VFL (SplitNN, GLM, GBM) train and predict
HFL
The Library and dataset details pages UI have been updated. You can now see the results of the latest EDA sessions run with it’s respective dataset in the UI.
You can now see the version of the product and the date it was released in the navigation bar on the Settings page in the workspace UI.
Documentation has been added for the Data Valuation features, Dataset Influence and Feature Importance.
Known issues:
EDA in Intersect mode: The
hide_intersection
functionality has been disabled in this release. Thehide_intersection
option has been set toFalse
by default.For PRL sessions,
row_number
is a reserved key in thedask
backend. Therefore if"id_columns": ["row_number"]
is used in a PRL session, the session fails.Storage paths for task runners: only the default S3 bucket and other buckets ending in
*integrate.ai
are supported as storage paths for task runners. If you are not using the default bucket created by task runner when it was provisioned, then ensure that your data is hosted in an S3 bucket with a URL ending in*integrate.ai
. Otherwise, the data will not be accessible to the task runner.In some instances, task logs may not appear in the UI. The logs do exist, the issue is that they are not displayed.
VFL GBM prediction results are not stored correctly and therefore metrics cannot be retrieved for these sessions.
On the Members > Admin page, not all users listed are Administrators. The listed user roles are correct.
Proxy task runner mode has been disabled in this release. This means that task runners are not proxied through a load balancer and will therefore not have a static IP address. The documentation for how to whitelist a task runner has been removed.
Starting many sessions simultaneously may trigger a throttling exception from AWS APIs. The error appears as:
An error occurred (ThrottlingException) when calling...
in the logs.Custom model package names may not contain uppercase letters or special characters other than underscore.
Fixes:
Dataset names can now include spaces and special characters.
The output of a VFL prediction session is now saved under the folder named by the predict session ID itself instead of under the training session ID.
Fixed an issue that was causing the Terms of Service to be blank/empty.
The Credit Usage system has been deprecated and removed from the product.
Adding attachments to a dataset is now fully supported.
When deleting a task runner, the task runner no longer appears and disappears from the list in the UI more than once before the deletion is complete.
21 April 2024: Release 9.9.0¶
This release introduces the following new features:
As a Data Custodian user who has registered datasets, you can now see sessions that have used your datasets.
Session and task runner detail pages have been expanded to full page width for better readability.
A record of deleted task runners is now logged in CloudWatch for auditing purposes.
Session and task logs are now accessible through the session details page in the UI.
Python 3.9 is now the minimum supported version for use with the SDK.
The Credit usage tracking system is being deprecated. Related UI components will be removed in the next release.
Known issues
Adding attachments to a dataset is not fully supported.
When updating a dataset, existing attachments are not displayed on the Update page.
Attempting to add a new attachment removes all existing attachments.
When deleting a task runner, the task runner may appear and disappear from the list in the UI more than once before the deletion is complete. A task runner is fully deleted after the destroying step has been completed and the entry is permanently removed from the list.
On the Members > Admin page, not all users listed are Administrators. The listed user roles are correct.
Fixes:
Logging has been improved across the platform.
A number of inaccuracies in the documentation have been corrected.
The allow/disallow Custom Models toggle now works correctly when set by a Data Custodians user.
19 Mar 2024: Enterprise-Networking Ready Task Runners¶
This release introduces the following new features:
Task runners are now proxied through a load balancer with a fixed IP address to enable customers to allow ingress/egress through firewalls
Added dataset registration for HFL datasets
Removed the requirement to agree to the Terms of Service in the UI before being able to use the product
Added an embedded SDK package for use in enterprise environments
Enabled debug log levels in the task logs for improved troubleshooting
Known issues:
You must update existing task runners after this release is deployed. Click the Edit button for a task runner and make any change, such as increasing the amount of memory, then click Save. This will force the task runner to update and pick up the latest changes.
Dataset registration for EDA and HFL sessions only
VFL sessions will be added in a future release.
This means that the VFL sessions do not support using only a registered dataset name. You must provide both a name and path for each dataset.
Renewal term for credits may appear to be in the past when in Discovery mode
Fixes:
Fixed the region limitation for task runners
Fixed the state issues with destroyed task runners so that they are properly removed from the system and the UI
Fixed an issue with SplitNN models that occurred when more than 2 clients joined the session with hide_intersection set to True.
Fixed the VFL prediction download paths
Standardized type faces and other UI improvements
12 Feb 2024: Role-based Access Control (RBAC)¶
This release introduces the following new features:
- Role-based access control with three new built-in roles:
Administrator - responsible for all aspects of managing the workspace
Model builder - responsible for running sessions and analyzing results
Data custodian - responsible for registering and maintaining dataset listings.
- Added task level logs to the session details in the UI for ease of troubleshooting
- Added the version number of the release (and therefore the task runners/client/server) on the Session Details page
- Added example URLs in the UI to improve usability
Known issues:
Task runners can only be created in the ca-central-1 region of AWS.
Will be fixed in next release
Dataset registration for EDA sessions only
HFL sessions will be added in next release
Destroyed task runners remain in the list in the UI
Will be fixed in next release
Renewal term for credits may appear to be in the past when in Discovery mode
Fixes:
Fixed the GRPC timeout errors
Fixed VFL-GLM for linear regression
Corrected the runtime role policy for task runners
Numerous UI improvements and corrections, including modal behaviour
19 Dec 2023: PCA, VFL-GBM, and Dataset Registration¶
This release provides several new features:
Principal Component Analysis (PCA) - you can now run PCA sessions using integrate.ai. An overview is available here.
VFL-GBM - you can now run Gradient Boosted Modeling in VFL sessions.
Dataset registration for EDA - You can now register a dataset with a task runner for exploratory data analysis. This simplifies the use of datasets as you no longer need to specify the path to the file in the task grou configuration. Instead, you can provide only the registered dataset name, and the task runner will locate the dataset.
Version: 9.6.6
28 Aug 2023: Session Credit Usage¶
This release provides users with the ability to see their credit usage in their workspace UI. Each training or analytic session uses a certain number of credits from the user’s allotment. This usage can now be monitored through a graph, with daily details. Users can also request additional credit when needed.
Version: 9.6.2
14 Aug 2023: Azure Task Runners¶
This release expands the Control Plane system architecture to include Microsoft Azure Task Runners.
Task runners simplify the process of creating an environment to run federated learning tasks. They use the serverless capabilities of cloud environements, which greatly reduces the administration burden and ensures that resource cost is only incurred while task runners are in use.
For more information about task runners and control plane capabilities, see Using integrate.ai.
A tutorial for using Azure task runners is available here.
Version: 9.6.1
14 July 2023: AWS Task Runners¶
This release introduces the Control Plane system architecture and AWS Task Runners.
Task runners simplify the process of creating an environment to run federated learning tasks. They use the serverless capabilities of cloud environements (such as AWS Batch and Fargate), which greatly reduces the administration burden and ensures that resource cost is only incurred while task runners are in use.
For more information about task runners and control plane capabilities, see Using integrate.ai.
Version: 9.6.0
17 May 2023: 2D Histograms for EDA Intersect & Linear Inference¶
This release introduces two new features:
The ability to generate 2D histograms for EDA sessions in Intersect mode. This feature requires the addition of a paired_cols parameter. For more information, see the Intersect Mode tutorial.
A new model package for linear inference. This package is particularly useful for GWAS training. For more information, see Linear Inference Sessions.
New in this release is also the addition of a single release version number to describe the release package. This release is version: 9.5.0.
27 April 2023: PRL, VFL, and EDA Intersect¶
This release introduces the following new features:
The ability to perform private record linkage (PRL) on two datasets. A guide is available here.
The ability to perform exploratory data analysis (EDA) in Intersect mode, using a PRL session result. A guide is available here.
The ability to perform Vertical Federated Learning (VFL) in both training and prediction mode. A guide is available here.
Note: this release does not support Python 3.11 due to a known issue in that Python release.
Versions:
SDK: 0.9.2
Client: 2.4.2
Server: 2.6.0
CLI Tool: 0.0.46
30 Jan 2023: User Authentication¶
This release added two new features:
The ability to train Gradient Boosted HFL Models. A guide is available here.
Bug fixes:
Clients may get disconnected from the server when training large models
Versions:
SDK: 0.5.36
Client: 2.0.18
Server: 2.2.19
CLI Tool: 0.0.38
08 Dec 2022: Integration with AWS Fargate¶
This release introduces the ability to run an IAI training server on AWS Fargate through the integrate.ai SDK. With an integrate.ai training server running on Fargate, your data in S3 buckets, and clients running on AWS Batch, you can use the SDK to manage and run fully remote training sessions.
Versions:
SDK: 0.5.13
Client: 2.0.11
CLI Tool: 0.0.33
02 Nov 2022: Integration with AWS Batch¶
This release introduces the ability to run AWS Batch jobs through the integrate.ai SDK. Building on the previous release, which added support for data hosted in S3 buckets, you can now start and run a remote training session on remote data with jobs managed by AWS Batch.
Features:
Added the ability to run the iai client through AWS Batch
Added the ability for the iai client to retrieve a token through the IAI_TOKEN environment variable
Added a version command for the iai client:
iai client version
Note: Docker must be running for the version command to return a value.
Added support for a new session “pending” status
BREAKING CHANGE:¶
Session status mapping in the SDK has been updated as follows:
created -> Created
started -> Running
pending -> Pending
failed -> Failed
succeeded -> Completed
canceled -> Canceled
Bug fixes:
Fixed an issue with small batch sizes
Versions:
SDK: 0.3.31
Client: 1.0.15
CLI Tool: 0.0.31
06 Oct 2022: Exploratory Data Analysis & S3 Support¶
Features:
Exploratory Data Analysis (EDA) - integrate.ai now supports the ability to generate histograms for each feature of a dataset. Use the results of the EDA session to calculate summary statistics for both continuous and categorical variables. See more about the feature here.
This feature has Differential Privacy applied automatically to each histogram to add noise and reduce privacy leakage. The Differential Privacy settings are dynamic and applied to best suit e each dataset individually, to ensure privacy protection without excessive noise.
S3 data path support - load data from an s3 bucket for the
iai client hfl
andiai client eda
commands. You can use S3 URLs as the data_path given that your AWS CLI environment is properly configured. Read more on how to configure this integration here.Client logging via
iai client log
command - this new feature in the integrate-ai CLI package allows a user to access session logs from clients, to be used as a tool to help debug failed sessions. Access this using theiai client log
command.
Versions:
SDK: 0.3.20
Client: 1.0.8
CLI Too: 0.0.21
14 Sept 2022: Infrastructure upgrades for session abstraction¶
SDK Version: 0.3.5
Client Version: 1.0.2
CLI Tool Version: 0.0.21