Using Integrate.ai

Using Integrate.ai¶

After the IT Administrator has deployed the required components to the cloud environment, the Workspace Administator can log in and start adding users and registering a task runner.

Workspace Administrator Workflow¶

Workspace Administrators have full control over the entire workspace, from adding and removing users and assigning roles, to controlling administrative and billing information. There must always be at least one user with this role to manage the workspace.

Invite users to the workspace.
Register an AWS task runner or Azure task runner for the data custodians and model builders to use to register datasets and perform model training.
If the enterprise IT landscape requires ingress and egress exceptions for firewalls, or other specific configuration, provide those details during the task runner registration in the Advanced settings.

Register an AWS Task Runner¶

Task runners simplify the process of running training sessions on your data.

Note: before attempting to register a task runner, ensure you have completed the AWS configuration for task runners.

To register an AWS task runner:

Log in to your integrate.ai workspace.
In the left navigation bar, click Settings.
Under Workspace, click Task Runners.
Click Register to start registering a new task runner.
Select the service provider - Amazon Web Services.
Provide the following information:

Task runner name - must be unique
Provisioner role ARN - the ARN created by the IT Administrator.
Runtime role ARN - the ARN created by the IT Administrator.
Region - select the AWS region to run in from the dropdown
Storage Path - by default the task runner creates a bucket for you to upload data into (e.g. s3://{aws_taskrunner_profile}-{aws_taskrunner_name}.integrate.ai ).
Only the default S3 bucket and other buckets ending in *integrate.ai are supported. If you are not using the default bucket created by the task runner when it was provisioned, ensure that your data is hosted in an S3 bucket with a URL ending in *integrate.ai. Otherwise, the data will not be accessible to the task runner.
vCPU - the number of virtual CPUs the task runner is allowed to use. The default is 8.
Memory size (GB) - the amount of memory to allocate to the task runners. The default is 32GB.

Click Save. Wait for the status to change to Online.

Optional Advanced settings for AWS task runners¶

There are several options in the Advanced settings section of the form that enable you to have fine-grain control over the task runner.

Container registry URL - Provide the URL to the S3 bucket containing the integrate.ai client image.
The format is: s3://<bucket URL>/<image name>.
Use an existing VPC - Provide the following information for your existing VPC configuration.
- Existing VPC ID
- Existing VPC public subnets
- Existing VPC private subnets
- Existing client security group
- Existing server security group
Create a new VPC in a different CIDR block - Provide the following information to create a new VPC in a specified CIDR block.
- Custom VPC CIDR
- Custom private subnet CIDR
- Custom public subnet CIDR
- Custom CIDR newbits
Use existing KMS keys - Provide the following infomation to use your own KMS keys instead of those generated by integrate.ai for the task runner.
- KMS data ID
- KMS secret ID
Use golden AMI - Provide the AMI ID for the golden AMI.

After successfully creating a task runner, you can use it to perform training tasks. You can reuse the task runner; there is no need to create a new one for each task.

Back to Quickstart

Register a dataset (AWS)¶

Register your dataset through the workspace, by following the steps below.

Log in to your integrate.ai workspace.
Click Datasets in the left navigation bar.
On the Datasets page, click Register dataset.
Select a task runner to manage tasks related to your dataset.
Note: If no task runners exist, ask your Workspace Administrator or a Model Builder to create one.
Click Next.
On the Dataset details and privacy controls page, type a name and description for the dataset.
Specify the URI of the dataset, using the s3:// format. Ensure that the prepared Parquet or CSV file(s) is located in the S3 bucket that your Task Runner has access to.
(Optional) If you have metadata to associate with the dataset, upload it in the Attachments section.
Click Connect.

Your dataset is now registered and can be used in a notebook.

Back to Quickstart

Register an Azure Task Runner¶

Task runners simplify the process of running training sessions on your data.

Note: before attempting to register a task runner, ensure you have completed the Azure configuration for task runners.

To register an Azure task runner:

Log in to your integrate.ai workspace.
In the left navigation bar, Click Settings.
Under Workspace, click Task Runners.
Click Register to start registering a new task runner.
Select the service provider - Microsoft Azure.
Provide the following information:

Task runner name - must be unique
Region - select from the list
Resource group - must be an existing dedicated resource group
Service principal ID - this is the appId from the Azure CLI output of creating a service principal.
Service principal secret - this is the password from the Azure CLI output.
Runtime Service principal ID - this is the application ID of the App Registration created.
Runtime Service principal secret - this is the secret generated for the application.
Subscription ID - the ID of your Microsoft Azure subscription. Can be found on the Azure dashboard.
Tenant ID - this is the tenantId from the Azure CLI output of creating a service principal.
vCPU - the number of virtual CPUs the task runner is allowed to use. The default and maximum is 4.
Memory size (MB) - the amount of memory to allocate to the task runners. The default and maximum is 16GB. This amount can be decreased, but not increased.
Storage account - displays the default storage account name: storage.
Storage path - displays the default storage account path: azure://sessionstorage.

Click Save. Wait for the status to change to Online.

Optional Advanced Settings for Azure task runners¶

There are several options in the Advanced settings section of the form that enable you to have fine-grain control over the task runner.

Container registry URL - Provide the URL to the Azure container registry containing the integrate.ai container images. The format is: <container registry>.azurecr.io/<image name>.
For example: iairepo.azurecr.io/edge/fl-client.
More information about container registries is available here.

Bind to an existing Azure Virtual Network - Enable this setting to provide the subnet ID for your existing Azure Virtual Network.
Managed identity to access ACR - Managed Identity resource_id with AcrPull built-in role in order to support Private Link Access for ACR

After successfully creating a task runner, you can use it to perform training tasks. You can reuse the task runner; there is no need to create a new one for each task.

Back to Quickstart

Register a dataset (Azure)¶

Register your dataset through the workspace UI, by following the steps below.

Log in to your integrate.ai workspace.
Click Datasets in the left navigation bar.
On the Datasets page, click Register dataset.
Select a task runner to manage tasks related to your dataset. Note: If no task runners exist, ask your Workspace Administrator or a Model Builder to create one.
Click Next.
On the Dataset details and privacy controls page, type a name and description for the dataset.
Specify the URI of the dataset, using the azure:// format. Ensure that the prepared Parquet or CSV file(s) is located in the Azure Blob that your Task Runner has access to.
(Optional) If you have metadata to associate with the dataset, upload it in the Attachments section.
Click Connect.

Your dataset is now registered and can be used in a notebook.

Back to Quickstart

Using Integrate.ai

Contents

Using Integrate.ai¶

Workspace Administrator Workflow¶

Register an AWS Task Runner¶

Optional Advanced settings for AWS task runners¶

Register a dataset (AWS)¶

Register an Azure Task Runner¶

Optional Advanced Settings for Azure task runners¶

Register a dataset (Azure)¶