Installation
Compute resources

Compute resources

The MOSTLY AI Platform runs Kubernetes jobs (opens in a new tab) to complete AI tasks. Depending on the size of the original data, a job may require a lot of memory and CPU to complete successfully. Because of this, it is very important that jobs are assigned to a node that has enough resources. Ideally, the node should be dedicated only to a single job and all of its resources should be available to the job.

Memory is crucial because insufficient memory causes job failure. The memory required for a job depends on the size and complexity of the original dataset.

CPU is less critical. Fewer CPUs slow down the job but do not cause failures.

Kubernetes nodes configuration

By default, the MOSTLY AI Helm chart uses taints to ensure that AI jobs run on dedicated worker nodes. MOSTLY AI recommends that you use the following taints on worker nodes:

  • Key: scheduling.mostly.ai/node
  • Value: engine-jobs
  • Effect: NoSchedule

If you already have a taint in place for worker nodes and node pools, before deploying you can modify the values.yaml file to define your existing taint.

values.yaml
  ...
  mostlyApp:
    deployment:
      resources: {}
      tolerations: []
      affinity: {}
      mostly:
        defaultComputePool:
          name: Default
          type: KUBERNETES
          toleration: engine-jobs # replace with your toleration value
  ...
  mostlyCoordinator:
    deployment:
      ...
      coreJob:
        affinity: {}
        tolerationKey: scheduling.mostly.ai/node # replace with your toleration key
        tolerationEffect: NoSchedule
        tolerationOperator: Equal
  ...

Worker node configuration

Assign jobs to nodes where they will succeed and complete quickly. Use the following values.yaml parameters to configure the AI workloads on the worker nodes. Kubernetes will assign AI jobs to a node, which has at least this amount of resources available.

  • mostlyApp.deployment.mostly.defaultComputePool.resources.cpu: To configure 14 cores, set cpu: 14.
  • mostlyApp.deployment.mostly.defaultComputePool.resources.memory: To configure 24 GB memory, set memory: 24.