Using Integrate.ai¶
After the IT Administrator has deployed the required components to the cloud environment, the Workspace Administator can log in and start adding users and registering a task runner.
Workspace Administrator Workflow¶
Workspace Administrators have full control over the entire workspace, from adding and removing users and assigning roles, to controlling administrative and billing information. There must always be at least one user with this role to manage the workspace.
Invite users to the workspace.
Register an AWS task runner or Azure task runner for the data custodians and model builders to use to register datasets and perform model training.
If the enterprise IT landscape requires ingress and egress exceptions for firewalls, or other specific configuration, provide those details during the task runner registration in the Advanced settings.
Register an AWS Task Runner¶
Task runners simplify the process of running training sessions on your data.
Note: before attempting to register a task runner, ensure you have completed the AWS configuration for task runners.
To register an AWS task runner:
Log in to your integrate.ai workspace.
In the left navigation bar, click Settings.
Under Workspace, click Task Runners.
Click Register to start registering a new task runner.
Select the service provider - Amazon Web Services.
Provide the following information:
Task runner name
- must be uniqueProvisioner role ARN
- the ARN created by the IT Administrator.Runtime role ARN
- the ARN created by the IT Administrator.Region
- select the AWS region to run in from the dropdownStorage Path
- by default the task runner creates a bucket for you to upload data into (e.g.s3://{aws_taskrunner_profile}-{aws_taskrunner_name}.integrate.ai
).
Only the default S3 bucket and other buckets ending in*integrate.ai
are supported. If you are not using the default bucket created by the task runner when it was provisioned, ensure that your data is hosted in an S3 bucket with a URL ending in*integrate.ai
. Otherwise, the data will not be accessible to the task runner.vCPU
- the number of virtual CPUs the task runner is allowed to use. The default is 8.Memory size (GB)
- the amount of memory to allocate to the task runners. The default is 32GB.
Click Save. Wait for the status to change to Online.
Optional Advanced settings for AWS task runners¶
There are several options in the Advanced settings section of the form that enable you to have fine-grain control over the task runner.
Container registry URL
- Provide the URL to the S3 bucket containing the integrate.ai client image.
The format is:s3://<bucket URL>/<image name>
.Use an existing VPC
- Provide the following information for your existing VPC configuration.Existing VPC ID
Existing VPC public subnets
Existing VPC private subnets
Existing client security group
Existing server security group
Create a new VPC in a different CIDR block
- Provide the following information to create a new VPC in a specified CIDR block.Custom VPC CIDR
Custom private subnet CIDR
Custom public subnet CIDR
Custom CIDR newbits
Use existing KMS keys
- Provide the following infomation to use your own KMS keys instead of those generated by integrate.ai for the task runner.KMS data ID
KMS secret ID
Use golden AMI
- Provide the AMI ID for the golden AMI.
After successfully creating a task runner, you can use it to perform training tasks. You can reuse the task runner; there is no need to create a new one for each task.
Register a dataset (AWS)¶
Register your dataset through the workspace, by following the steps below.
Log in to your integrate.ai workspace.
Click Library in the left navigation bar.
On the Datasets tab, click Register dataset.
Select a task runner to manage tasks related to your dataset.
Note: If no task runners exist, ask your Workspace Administrator or a Model Builder to create one.Click Next.
On the Dataset details and privacy controls page, type a name and description for the dataset.
Specify the URI of the dataset, using the
s3://
format. Ensure that the prepared Parquet or CSV file(s) is located in the S3 bucket that your Task Runner has access to.(Optional) If you have metadata to associate with the dataset, upload it in the Attachments section.
Click Connect.
Your dataset is now registered and can be used in a notebook.
Register an Azure Task Runner¶
Task runners simplify the process of running training sessions on your data.
Note: before attempting to register a task runner, ensure you have completed the Azure configuration for task runners.
To register an Azure task runner:
Log in to your integrate.ai workspace.
In the left navigation bar, Click Settings.
Under Workspace, click Task Runners.
Click Register to start registering a new task runner.
Select the service provider - Microsoft Azure.
Provide the following information:
Task runner name
- must be uniqueRegion
- select from the listResource group
- must be an existing dedicated resource groupService principal ID
- this is theappId
from the Azure CLI output of creating a service principal.Service principal secret
- this is thepassword
from the Azure CLI output.Runtime Service principal ID
- this is theapplication ID
of the App Registration created.Runtime Service principal secret
- this is thesecret
generated for the application.Subscription ID
- the ID of your Microsoft Azure subscription. Can be found on the Azure dashboard.Tenant ID
- this is thetenantId
from the Azure CLI output of creating a service principal.vCPU
- the number of virtual CPUs the task runner is allowed to use. The default is 4.Memory size (MB)
- the amount of memory to allocate to the task runners. The default is 16GB. This amount can be decreased, but not increased.Storage path
- Displays the default storage account name:<taskrunnername>storage
.
Click Save. Wait for the status to change to Online.
Optional Advanced Settings for Azure task runners¶
There are several options in the Advanced settings section of the form that enable you to have fine-grain control over the task runner.
Container registry URL
- Provide the URL to the Azure container registry containing the integrate.ai container images. The format is:<container registry>.azurecr.io/<image name>
.
For example:iairepo.azurecr.io/edge/fl-client
.
More information about container registries is available here.
Bind to an existing Azure Virtual Network
- Enable this setting to provide the subnet ID for your existing Azure Virtual Network.
After successfully creating a task runner, you can use it to perform training tasks. You can reuse the task runner; there is no need to create a new one for each task.
Register a dataset (Azure)¶
Register your dataset through the workspace UI, by following the steps below.
Log in to your integrate.ai workspace.
Click Library in the left navigation bar.
On the Datasets tab, click Register dataset.
Select a task runner to manage tasks related to your dataset. Note: If no task runners exist, ask your Workspace Administrator or a Model Builder to create one.
Click Next.
On the Dataset details and privacy controls page, type a name and description for the dataset.
Specify the URI of the dataset, using the
azure://
format. Ensure that the prepared Parquet or CSV file(s) is located in the Azure Blob that your Task Runner has access to.(Optional) If you have metadata to associate with the dataset, upload it in the Attachments section.
Click Connect.
Your dataset is now registered and can be used in a notebook.