MOSTLY AI runs as a set of containerized applications and services that you can deploy in a Kubernetes cluster and maintain a fault-tolerant and highly available application.

MOSTLY AI architecture diagram

Application Nodes

Pod and image name
DescriptionPod Lifecycle
Web frontendmostly-ui

Contains the frontend of MOSTLY AI. Reachable over port 8080.Service
Terminal icon Backendmostly-app

Contains the backend and public APIs of MOSTLY AI.Service
Coordinator Servicemostly-coordinator

Component that takes all requests from the web application and coordinates execution of tasks on the main AI engine.Service
Storage icon Datamostly-data

Component that reads metadata and analyzes data sources and destinations.Service

Keycloak is an open-source identity management, authentication, and authorization tool. This container has a pre-configured Keycloak instance for MOSTLY AI.Service

Database instance of the system. Contains databases for app, coordinator, and Keycloak.Service
Rabbit MQmostly-rabbitmq

Message queue handling communications between the AI engine and the application.Service

AI Worker nodes

Pod and image nameDescriptionPod Lifecycle
Task agent jobagent‑<task‑id>

Job that runs the steps to synthesize data.Job
Data jobengine‑step‑<step‑id>

Component that reads from data sources and writes into data destinations.Job
AI jobengine-step-<step-id>

The main engine component, which does the AI training and data generation.Job
QA jobengine-step-<step-id>

Engine component, which creates the Quality Assurance report for privacy and accuracy of the generated data versus the original data.Job

Third-party integrations and connections

Active Directory Active Directory is an optional integration that can help you manage the authentication of users to MOSTLY AI. With this integration, end users do not need to create new credentials to log in to MOSTLY AI.

Image repository

The MOSTLY AI image repository contains the deployment images of all containers and makes it easy to deploy MOSTLY AI to various types of Kubernetes clusters.

Corporate databases

MOSTLY AI can connect to your internal databases (with the help of ) and read original data or deliver the generated synthetic data in the same or another database.

MOSTLY AI can generate synthetic data that preserves the correlations, structure, and referential integrity of multi-table data. The synthesis of data stored in databases is where MOSTLY AI excels.

Cloud storage buckets and NFS drives

In addition to databases, you can also read original data and deliver synthetic data from and to cloud storage buckets (AWS S3, Azure blob storage, Google Cloud storage buckets) as well NFS drives local to the server where MOSTLY AI is deployed.

Supported Kubernetes storage classes

MOSTLY AI uses two types of storage.

  • Block Storage. Used by single pods, such as PostgreSQL, Rabbit MQ, license file.
  • Shared Storage. Shared by various pods to store models, synthetic data, and so on.
    • For single-node deployments, you can use an RWO storage class for shared storage.
    • For multi-node deployments, you need an RWX storage class for shared storage.

For a list of all supported storage classes, see the table below.

ReadWriteOnce (RWO)ReadWriteMany (RWX)
Amazon AWS• Elastic Block Storage (EBS)• Elastic File System (EFS)
Google Cloudstandard-rwo
OpenShift• CephFS• CephFS
minikube• Local storage• Local storage

Unlisted storage classes are not officially supported. For more information, see Storage classes (opens in a new tab) in the Kubernetes documentation.