Troubleshoot common deployment issues
Learn how you can troubleshoot MOSTLY AI deployment issues that might occur in any Kubernetes environment.
Pods stay in pending
after a cluster restart or "hot swap"
Problem
If your policies require to start the cluster on-demand, to move the workloads through nodes, or start as required, you might see that the pods remain in pending
status.
You can then obtain more details about one of the pods with the kubectl describe
command.
kubectl -n mostly-ai describe pod POD_NAME
You might see the following:
Warning FailedScheduling 0/8 nodes are available: 5 Insufficient memory,
3 node(s) didn't match node selector.
Solution
Most cloud providers provide different nodes after restart. The same happens for large on-prem deployments with procedures like "hot swapping" or maintenance restarts. MOSTLY AI uses nodeAffinity
by default to schedule workloads to nodes, and it may be the case that your new nodes do not include the labels that the application is requiring to schedule the pods.
To solve this issue, apply the node labels required by MOSTLY AI.
- Apply the
mostly_app=yes
label to your application nodes.kubectl label node APP_NODE_NAME mostly_app=yes
- Apply the
mostly_worker=yes
label to your worker nodes.kubectl label node WORKER_NODE_NAME mostly_worker=yes
Keep in mind
- If you use Terraform, CloudFormation, Karpenter, or similar tools to deploy and scale your infrastructure, it is best you apply the labels on your nodes before you deploy MOSTLY AI.
- If you provision new nodes in your cluster, make sure they have enough capacity (RAM and CPU) to meet the workloads requirements of MOSTLY AI. For more information, see compute and memory requirements (opens in a new tab).