Deployment checklist
The checklist below provides a list of prerequisites to ensure a successful installation process. Before you contact MOSTLY AI to complete the installation or troubleshoot installation issues, make sure to complete the checklist.
General checklist for Kubernetes clusters
Compute (CPU and memory) requirements
- Make sure that your Kubernetes cluster meets the compute and memory requirements.
Storage requirements
- A supported block storage class for shared storage and PostgreSQL.
Networking requirements
- Your Kubernetes cluster can access the container image repository.
- By default, MOSTLY AI serves the container images from the MOSTLY AI image repository at
nexus.test.mostlylab.com
. To deploy the images, your Kubernetes cluster needs to have Internet access. - If your internal IT policies require that you pull the images from an internal repository, ensure that your Kubernetes cluster has access to it. For more information, Configure an internal image repository.
- By default, MOSTLY AI serves the container images from the MOSTLY AI image repository at
- Your Kubernetes cluster has network access to the defined storage classes and to the data sources (databases and cloud object storage providers) from which you want to pull original data.
- Collaborate with your IT department and Customer Experience Engineer to configure an domain SSL certificate for your Kubernetes cluster and for the MOSTLY AI Helm chart.
Access and permissions requirements
- Your Kubernetes cluster user has permissions to read and write into the storage class.
- On the AI worker nodes where MOSTLY AI jobs run, you should have no taints (opens in a new tab) defined that might not allow pods to be created with the minimum and maximum resource requirements specified in the
values.yaml
for the engine. Otherwise, MOSTLY AI jobs will fail to run. - If you already have taints on your worker nodes, you need to add tolerations on the MOSTLY AI pods in the
values.yaml
file, underagent.tolerations
andengine.tolerations
.
values.yaml
...
tolerations:
# Replace with the actual key label of the taint
# For example: `Tainted-worker:NoSchedule`
- key: "Tainted-worker"
operator: "Exists"
effect: "NoSchedule"
...
- On the AI worker nodes where MOSTLY AI jobs run, make sure that no other pods belonging to other applications can run so as not to interfere with MOSTLY AI workloads. You can apply taints (opens in a new tab) on nodes dedicated to MOSTLY AI workloads so as to prevent other workloads from running.
- Verify that any resource quotas (opens in a new tab) created for your namespace allow MOSTLY AI to successfully run worker nodes based on their requirements.
- If you have specific username requirements to access databases or other resources, update the Helm chart
values.yaml
file. In specific cases, due to Oracle security policies you might need to allowlist container users in Oracle.
Other requirements
- Disable any tools or service mesh services in the MOSTLY AI namespace that intercept communications between pods and require manual approval to proceed. If you have enabled such services, such as Linkerd or Istio, MOSTLY AI jobs might be prevented from starting and completing successfully.
- Work with your IT team to enable backups of the PostgreSQL database.
AWS EKS Kubernetes cluster checklist
- AWS EKS cluster running Kubernetes 1.23 or higher. For more information, see Amazon EKS documentation (opens in a new tab).
- Compute resources. The AWS EKS cluster has at least six worker nodes of type
m5.xlarge
. For more information, see Amazon EC2 instance types (opens in a new tab). - Storage (required for single- and multi-node). Integrate with Amazon Elastic Block Store (EBS) with your EKS cluster by installing and configuring the
aws-ebs-csi-driver
. For more information, Amazon EBS CSI driver (opens in a new tab). - Networking. Create a Virtual Private Cloud (VPC) with a
/16
subnet netmask. This provides up to 65,536 private IPv4 addresses. For more information, see the AWS VPC documentation. - Networking. Integrate with Amazon Elastic Load Balancing (ALB) to automatically distribute your incoming traffic across multiple targets, such as EC2 instances, containers, and IP addresses, in one or more Availability Zones. Install the
aws-load-balancer-controller
in your EKS cluster to manage your ALBs and create an Ingress that uses this controller.
For more information, see Installing the AWS Load Balancer Controller add-on (opens in a new tab). - Networking. Create a Domain name in Route 53 that will point to your ALB. Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service. To configure a domain registered in Route 53 to point to AWS ALB, you can use the Amazon Route 53 Documentation (opens in a new tab). For specific configurations and support, contact AWS Support.
- Security. AWS Certificate Manager (ACM) helps you to provision, manage, and renew publicly trusted TLS certificates on AWS based websites. Create an SSL certificate in ACM that can be used with your ALB. For more information, see AWS Certificate Manager > Requesting a public certificate (opens in a new tab).
MOSTLY AI Helm charts checklist
- Obtain the MOSTLY AI Helm charts from your Customer Experience Engineer.
- Obtain a Docker pull image secret from your Customer Experience Engineer.
- Ensure Internet connectivity to pull the Docker images from the MOSTLY AI repository.
- Get acquainted with the default configuration
values.yaml
file in the MOSTLY AI Helm charts. - Make sure that the nodes in your Kubernetes cluster can accommodate the container resource requirements defined in the
values.yaml
file.