Installation
Troubleshoot
Common issues

Troubleshoot common deployment issues

Learn how you can troubleshoot MOSTLY AI deployment issues that might occur in any Kubernetes environment.

Pods stay in pending after a cluster restart or "hot swap"

Problem

If your policies require to start the cluster on-demand, to move the workloads through nodes, or start as required, you might see that the pods remain in pending status.

You can then obtain more details about one of the pods with the kubectl describe command.

kubectl -n mostly-ai describe pod POD_NAME

You might see the following:

Warning  FailedScheduling    0/8 nodes are available: 5 Insufficient memory, 
3 node(s) didn't match node selector.

Solution

Most cloud providers provide different nodes after restart. The same happens for large on-prem deployments with procedures like "hot swapping" or maintenance restarts. MOSTLY AI uses nodeAffinity by default to schedule workloads to nodes, and it may be the case that your new nodes do not include the labels that the application is requiring to schedule the pods.

To solve this issue, apply the node labels required by MOSTLY AI.

  1. Apply the mostly_app=yes label to your application nodes.
    kubectl label node APP_NODE_NAME mostly_app=yes
  2. Apply the mostly_worker=yes label to your worker nodes.
    kubectl label node WORKER_NODE_NAME mostly_worker=yes

Keep in mind

  1. If you use Terraform, CloudFormation, Karpenter, or similar tools to deploy and scale your infrastructure, it is best you apply the labels on your nodes before you deploy MOSTLY AI.
  2. If you provision new nodes in your cluster, make sure they have enough capacity (RAM and CPU) to meet the workloads requirements of MOSTLY AI. For more information, see compute and memory requirements (opens in a new tab).