Overview
The MOSTLY AI Platform runs Kubernetes jobs (opens in a new tab) to complete AI tasks. Depending on the size of the original data, a job may require a lot of memory and CPU to complete successfully. Because of this, it is very important that jobs are assigned to a node that has enough resources. Ideally, the node should be dedicated only to a single job and all of its resources should be available to the job.
Memory is crucial because insufficient memory causes job failure. The memory required for a job depends on the size and complexity of the original dataset.
CPU is less critical. Fewer CPUs slow down the job but do not cause failures.
Kubernetes nodes configuration
MOSTLY AI recommends that you run the platform on dedicated high capacity nodes. An alternative is to implement auto-scaling. By default, the MOSTLY AI Helm chart uses affinity-based node selectors.
Before you deploy, either label your nodes as indicated below or remove the selectors from the values.yaml
file (not recommended).
- Web application nodes:
mostly-app=yes
- AI worker nodes:
mostly-worker=yes
Worker node configuration
Assign jobs to nodes where they will succeed and complete quickly. Use the following values.yaml
parameters to configure the AI workloads on the worker nodes:
- CPU and memory requirements: Kubernetes will assign the AI job to a node, which has at least this amount of resources available.
engine.resources.requests.cpu
: Cores in millicores (1/1000 of a core). For example, configure 3 cores by settingcpu: 3000m
engine.resources.requests.memory
: Memory in Gigabytes. For example, configure 8 GB of memory by settingmemory: 8Gi
- CPU and memory limits: Kubernetes will not assign more resources than these amounts to a single AI job, so as to avoid overloaded and blocked nodes. Unlike requirements, limits are not considered during the assignment of pods to nodes. Limits are set a cap to the consumption after a job is scheduled to a node.
engine.resources.limits.cpu
: Cores in millicores (1/1000 of a core). For example, configure a 5 cores limit by settingcpu: 5000m
engine.resources.limits.memory
: Memory in Gigabytes. For example, configure a 24 GB memory limit by settingmemory: 24Gi
Guidelines for worker node configuration
For efficient workloads, you might need to adjust the worker node requirements and limits depending on the size of your nodes, the cluster configuration, and the size of the data you want to synthesize.
To run n
jobs in parallel per node, the general rule of thumb is:
- Set aside 1 CPU and 1-2 GB memory for other workloads (Kubernetes, OS, and so on)
- Set aside 0.5 CPU and 6 GB memory per number of jobs to run in parallel on a node
- Assign CPU limits with the formula:
(Number of CPUs - n * 0.5 CPU - 1 CPU) / n
- Assign memory limits with the formula:
(Total memory - n * 5 GB - 1 GB) / n
- Minimize the gap between CPU and memory requirements and limits. It is also an option to set the same size of requirements and limits.
- If the sizes of the worker nodes vary (not recommended), base the requirements on the smallest node capacity.
Example: Let us assume the following cluster consists of 3 nodes, each with 32 CPUs and 128GB memory.
Here are alternatives to configure the nodes:
-
One job per node: This is the safest alternative, to dedicate almost all resources to a job. It is less efficient if your original data is not too big and complex in terms of number of columns, rows, and sequences.
values.yamlresources: limits: cpu: 30000m memory: 122Gi requests: cpu: 28000m memory: 110Gi
-
Two jobs per node: This alternative makes better use of resources as long as you do not intend to synthesize exceptionally big and complex datasets.
values.yamlresources: limits: cpu: 15000m memory: 58Gi requests: cpu: 13000m memory: 50Gi