Installation
Deploy to Azure AKS

Deploy MOSTLY AI to an Azure AKS cluster

You can deploy MOSTLY AI to an Azure AKS cluster. This page covers the list of prerequisites and the pre-deployment and deployment steps for a successful installation.

Prerequisites

Pre-deployment

Before you deploy MOSTLY AI to an Azure AKS cluster, you need to complete several pre-deployment tasks.

Task 1. Create an AKS cluster

Use the Create Kubernetes cluster wizard in Azure to create your AKS cluster.

Steps

  1. Log in to Microsoft Azure and select Create a resource. Azure - Create AKS cluster - Select Create a resource
  2. In the services search bar, type aks and press Enter. Azure - Create AKS cluster - Search for 'aks'
  3. From the results, select Azure Kubernetes Service (AKS). Azure - Create AKS cluster - Select Azure Kubernetes Services
  4. Click Create. Azure - Create AKS cluster - Click Create on AKS page Step result: The Create Kubernetes cluster wizard starts.
  5. Configure the Basics page of the wizard.
    💡

    You can change the settings below to suit your needs. The steps provide a minimal configuration to help you set up a new cluster quickly.

    1. For Subscription, select the Azure subscription you want to use for the cluster.
      For more information, see the Prerequisites.
    2. For Resource group, define a new resource group for your MOSTLY AI cluster.
      For more information, see Manage resource groups (opens in a new tab) in the Azure documentation.
    3. For Cluster preset configuration, select Production Standard.
    4. For Kubernetes cluster name, define the cluster name.
      For example, MOSTLYAI-AKS.
    5. (Optional) For Region, select the region to use for the cluster. Azure - Create AKS cluster - Configure Basic page of Create cluster wizard
    6. Retain the remaining default options and click Next.
  6. Configure the Node pools page of the wizard.
    1. For the agentpool node pool, click the Node size entry. Azure - Create AKS cluster - Click node size
    2. On the Update node pool page, for Node size click Choose a size. Azure - Create AKS cluster - Choose a node size
    3. On the Select a VM size page, select D8s_v3 and click Select. Azure - Create AKS cluster - Select VM size
    4. Back on the Update node pool page, for Scale method, select Manual and leave Node count to 1. Azure - Create AKS cluster - Select Scale method and Node count
    5. For the userpool node pool, click the Node size entry. Azure - Create AKS cluster - Select userpool node size
    6. On the Update node pool page, for Node size click Choose a size.
    7. On the Select a VM size page, select Standard_D16ls_v5 and click Select.
    8. Back on the Update node pool page, for Scale method, select Manual and leave Node count to 2.
    9. Click Next.
  7. Configure the Networking page of the wizard.
    1. For Network policy, select None. Azure - Select Network policy - None
  8. (Optional) Review the Integrations, Advanced, and Tags pages.
  9. Click Review + Create.
  10. If the final configuration validation completes, click Create. Azure - Select Network policy - None

Result

The cluster deployment starts. Wait several minutes until Azure creates your AKS cluster.

Azure - Select Network policy - None

When the cluster is created, Azure reports that the deployment is complete.

Azure - Select Network policy - None

Task 2. Log in to the Azure CLI

As listed in the Prerequisites, install the Azure CLI and use it to log in to your account.

Steps

  1. In a web browser, log in to your Azure account.
  2. Log in to the Azure CLI.
    az login

Result

The command opens a browser window where it authenticates the Azure CLI.

In the command line, you should see output similar to the following:

💡

Real values are obfuscated with asterisks (*).

A web browser has been opened at https://login.microsoftonline.com/organizations/oauth2/v2.0/authorize. Please continue the login in the web browser. If no web browser is available or if the web browser fails to open, use device code flow with `az login --use-device-code`.
[
    {
        "cloudName": "AzureCloud",
        "homeTenantId": "********",
        "id": "********",
        "isDefault": true,
        "managedByTenants": [],
        "name": "Pay-As-You-Go",
        "state": "Enabled",
        "tenantId": "********",
        "user": {
        "name": "******@mostly.ai",
        "type": "user"
        }
    }
]

Task 3. Connect to the Azure AKS cluster

You can find the exact Azure CLI commands to connect to your AKS cluster from the cluster web page.

Steps

  1. From the AKS cluster page, click Connect to cluster. Azure AKS cluster page - Click Connect to cluster
  2. Copy and run the commands to connect to your cluster. Azure AKS cluster page - Run commands to connect to the cluster
    1. Use the Azure CLI to set the account subscription.
      💡

      Real values are obfuscated with asterisks (*).

      az account set --subscription ********-4a22-414c-a2a6-************
    2. Set the credentials to register your AKS cluster as the current CLI context.
      az aks get-credentials --resource-group MOSTLYAI-AKS --name MOSTLYAI-AKS
      You should see output similar to the following:
      Merged "MOSTLYAI-AKS" as current context in /Users/mostlyzach/.kube/config
  3. (Optional) Check the cluster name of your current CLI context.
    kubectl config view --minify -o jsonpath='{.clusters[].name}'
    Depending on the cluster name you defined (in our case MOSTLYAI-AKS), you should see output similar to the following:
    MOSTLYAI-AKS%

Result

Your CLI is now set up to use the new Azure AKS cluster as the default K8S context. You can now run kubectl and helm command against this context.

Task 4: Install NGINX

If you do not have an ingress controller running for your cluster, you can install NGINX to allow external traffic via HTTP or HTTPS to your cluster.

For more information, see Basic configuration for an unmanaged ingress controller (opens in a new tab) in the Azure Kubernetes Services documentation.

Steps

  1. Add the helm chart repository for the Ingress NGINX Controller.
    NAMESPACE=ingress-basic
    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    helm repo update
    You should see the following result:
    "ingress-nginx" has been added to your repositories
  2. Install the Ingress NGINX Controller.
    helm install ingress-nginx ingress-nginx/ingress-nginx \
    --create-namespace \
    --namespace $NAMESPACE \
    --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz
    You should see a result similar to the following:
    NAME: nginx-ingress
    LAST DEPLOYED: Thu Nov  9 10:04:24 2023
    NAMESPACE: default
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    NOTES:
    The ingress-nginx controller has been installed.
    It may take a few minutes for the LoadBalancer IP to be available.
    You can watch the status by running 'kubectl --namespace default get services -o wide -w nginx-ingress-ingress-nginx-controller'
    
    An example Ingress that makes use of the controller:
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
        name: example
        namespace: foo
    spec:
        ingressClassName: nginx
        rules:
        - host: www.example.com
            http:
            paths:
                - pathType: Prefix
                backend:
                    service:
                    name: exampleService
                    port:
                        number: 80
                path: /
        # This section is only required if TLS is to be enabled for the Ingress
        tls:
        - hosts:
            - www.example.com
            secretName: example-tls
    
    If TLS is enabled for the Ingress, a Secret containing the certificate and key must also be provided:
    
    apiVersion: v1
    kind: Secret
    metadata:
        name: example-tls
        namespace: foo
    data:
        tls.crt: <base64 encoded cert>
        tls.key: <base64 encoded key>
    type: kubernetes.io/tls
  3. Get the load balancer IP address to assign to your cluster.
    kubectl --namespace ingress-basic get services -o wide -w ingress-nginx-controller
    The output should be similar to the following:
    💡

    Real values are obfuscated with asterisks (*).

    NAME                                     TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE    SELECTOR
    nginx-ingress-ingress-nginx-controller   LoadBalancer   10.0.***.***   20.72.***.***   80:30454/TCP,443:32319/TCP   8m8s   app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx
  4. Copy the value under External-IP address and assign it to your FQDN through your DNS provider.

Result

Your cluster can now accept incoming connections. Also, if configured, the cluster external IP address is accessible through your FQDN for the MOSTLY AI app.

Deployment

When you are ready with the pre-deployment tasks, you can configure your deployment in the values.yaml file and use the helm command to start the deployment process.

Task 5. Configure MOSTLY AI Helm chart

The MOSTLY AI Helm chart defines default configurations for your MOSTLY AI Kubernetes deployment. Before you can deploy, you need to configure some of the default values to match your cluster configuration.

As mentioned in the Prerequisites, you need to obtain the MOSTLY AI Helm chart from your Customer Experience Engineer.

Steps

  1. Open the values.yaml file in a text editor.
  2. At the start, set the application domain name to an FQDN. Do the same as listed below for minio.
    💡

    minio is the shared storage service.

    values.yaml
    _customerInstallation:
      domainNames:
        mostly-ai: &fqdn yourfqdn.com
        minio: &fqdnMinio minio-yourfqdn.com
  3. (Optional) Apply one of the configurations below depending on whether you intend to use TLS-encrypted access to the MOSTLY AI application.

    ➡️ You use an TLS certificate. Replace your-tls-secret with the TLS secret name as defined in your cluster configuration.
    💡

    Your IT department or Kubernetes administrator creates the FQDN and its TLS certificate and adds it to the configuration of your cluster. When added, it comes with a TLS secret name that you can define in the values.yaml file. For details, see Configure your domain TLS certificate.

    values.yaml
    _customerInstallation:
    ...
      deploymentSettings:
        tlsSecretName: &tlsSecretName your-tls-secret
    ...
    ➡️ You do not use an TLS certificate. Replace the your-tls-secret with an empty string and, for global.tls, set enabled to false.
    values.yaml
    _customerInstallation:
    ...
      deploymentSettings:
        tlsSecretName: &tlsSecretName [] # your-tls-secret
    ...
    global:
      ...
      tls:
        enabled: false
    ...
  4. (Optional) If you host third-party container images in an internal repository, replace docker.io in registryFor3rdPartyComponents.
    values.yaml
    _customerInstallation:
    ...
      deploymentSettings:
      ...
        registryFor3rdPartyComponents: &registryFor3rdPartyComponents docker.io
    ...
  5. (Optional) If you need to host MOSTLY AI container images in an internal repository, replace harbor.env.mostlylab.com/mostlyai in mostlyRegistry.
    values.yaml
    _customerInstallation:
    ...
      deploymentSettings:
      ...
        mostlyRegistry: &mostlyRegistry harbor.env.mostlylab.com/mostlyai
    ...
  6. (Optional) If you intend to use the MOSTLY AI image repository at harbor.env.mostlylab.com/mostlyai, set its secret in mostlyRegistryDockerConfigJson.
    💡

    To obtain the secret, contact your MOSTLY AI Customer Experience Engineer.

    values.yaml
    _customerInstallation:
    ...
      deploymentSettings:
      ...
        mostlyRegistryDockerConfigJson: &mostlyRegistryDockerConfigJson <HARBOR_SECRET>
    ...
  7. Define the Default compute under mostly-app.

    Use CPU and memory resources that are slightly below the ones defined for your worker nodes. For example, if your worker node has 14 CPU cores and 24GB memory, define 10 CPU cores and 20GB memory in your Helm chart.
    values.yaml
    mostly-app:
    ...
        mostly:
          defaultComputePool:
            name: Default
            type: KUBERNETES
            toleration: engine-jobs
            resources:
              cpu: 10
              memory: 20
              gpu: 0
    ...
  8. Enable the ingress annotations for NGINX.
    1. Assign the ingressClassName to be nginx under global.ingress.ingressClassName.
      values.yaml
      global:
      ...
        ingress:
          annotations:
            route.openshift.io/termination: edge
          ingressClassName: nginx
      ...
    2. Enable the ingress annotations under mostly-app.
      values.yaml
      mostly-app:
          ingress:
              annotations:
                nginx.ingress.kubernetes.io/proxy-body-size: 10240m
                nginx.ingress.kubernetes.io/proxy-buffer-size: 128k
                nginx.org/proxy-connect-timeout: 3000s
                nginx.org/proxy-read-timeout: 3000s
                nginx.org/client-max-body-size: 3000m
              # annotations: {}
    3. Enable the ingress annotations under mostly-keycloak.
      values.yaml
      mostly-keycloak:
          deployment:
              resources: {}
              tolerations: []
              affinity: {}
          ingress:
              annotations:
                  nginx.ingress.kubernetes.io/proxy-body-size: 10240m
                  nginx.ingress.kubernetes.io/proxy-buffer-size: 128k
              # annotations: {}

Result

Your values.yaml file is now configured with the required settings for your MOSTLY AI deployment.

Task 6. Add taints to the worker nodes pool

The MOSTLY AI worker pods can only run on nodes with the engine-jobs taint. You need to add this taint to the worker nodes pool.

Steps

  1. For your Azure AKS cluster, select Settings > Node pools.
  2. Select the userpool node pool. Deploy MOSTLY AI to Azure AKS - Node pools
  3. For Taints, click edit. Deploy MOSTLY AI to Azure AKS - Edit taints
  4. Add the engine-jobs taint with the NoSchedule effect.
    1. For Key, enter scheduling.mostly.ai/node.
    2. For Value, enter engine-jobs.
    3. For Effect, select NoSchedule.
    4. Click Save. Deploy MOSTLY AI to Azure AKS - Define taint

Result

The worker nodes for the generator training and synthetic dataset generation jobs will be tainted correctly with the engine-jobs taint.

Task 7. Deploy MOSTLY AI

With all required configurations made in the values.yaml and worker nodes tainted correctly, you can now create a separate namespace and deploy MOSTLY AI to it.

Steps

  1. Create the mostly-ai namespace.
    kubectl create ns mostly-ai
  2. Deploy the MOSTLY AI Helm chart.
    helm upgrade --install mostly-ai ./mostly-combined --values values.yaml --namespace mostly-ai
    The result from the command should be similar to the following. If you see errors, see the Troubleshooting section.
    Release "mostly-ai" does not exist. Installing it now.
    NAME: mostly-ai
    LAST DEPLOYED: Fri Nov 10 18:45:58 2023
    NAMESPACE: mostly-ai
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
  3. In Azure, go to your AKS cluster and select Workloads.
    Initially, all of the MOSTLY AI pods are in progress of starting and connecting to storage and to each services. Wait for a few minutes until AKS provisions pods and all pods connect successfully to all required services. Deploy MOSTLY AI to Azure AKS - Pods started

Post-deployment

With the MOSTLY AI pods running, you can now log in to your MOSTLY AI deployment for the first time.

Task 8: Log in to your MOSTLY AI deployment

Log in for the first time to your MOSTLY AI deployment to set a new password for the superadmin user.

Prerequisites

Contact MOSTLY AI to obtain the superadmin credentials. You need them to log in for the first time.

Steps

  1. Open your FQDN in your browser.
    Step result: You Sign in page for your MOSTLY AI deployment opens. MOSTLY AI Deployment - Log in page
  2. Enter the superadmin credentials and click Sign in.
  3. Provide a new password and click Change password.

Result

Your superadmin password is now changed and you can use it to log in again to your MOSTLY AI deployment.

Uninstall and cleanup

To uninstall, you can delete the Kubernetes namespace that holds all MOSTLY AI pods (by default, we suggest that you name this namespace mostly-ai). To clean up, you can delete your AKS cluster afterwards.

Delete MOSTLY AI namespace

kubectl delete namespace mostly-ai

Delete AKS cluster

Use the Azure CLI to delete your AKS cluster.

Steps

  1. Obtain your cluster name and resource group name in Azure.

    Uninstall MOSTLY AI - obtain Azure AKS cluster name and resource group
  2. Delete your cluster with the following command.

    az aks delete --name MyClusterName --resource-group MyResourceGroup
  3. Enter y and press Enter at the prompt.

    Are you sure you want to perform this operation? (y/n): y