Learn about the hardware, software, and user and data access requirements when installing and deploying MOSTLY AI onto your company’s infrastructure. |
MOSTLY AI’s architecture
Below you can see MOSTLY AI’s architecture diagram. It depicts how its different components interact with each other, with services in your company’s server environment, and with clients.

Hardware requirements
Running MOSTLY AI requires a cluster of at least two virtual machines. One of them will function as the application server and the others will function as AI servers.
The application server is responsible for the web-based user interface and the distribution of synthetic data generation tasks across the AI servers. The minimum requirements for the application server are as follows:
-
two CPUs
-
32 Gb of memory
-
should always be up and running
The AI servers are responsible for processing these synthetic data generation tasks. The hardware requirements for the AI servers are classified into tiers depending on the dataset size:
-
Tier 1 refers to a dataset up to 1 million subjects and 100 columns
-
Tier 2 refers to a dataset up to 10 million subjects and 250 columns.
-
Tier 3 refers to a dataset larger than 10 million subjects and 250 columns.
For on-premises deployment, these tiers refer to the following machine types:
Tier 1 | Tier 2 | Tier 3 | |
---|---|---|---|
Data size |
up to 100k subjects and 100 columns |
up to 10m subjects and 250 columns |
more than 10m subjects and 250 columns |
CPU |
32 Cores |
64 Cores |
64 Cores |
Memory |
128 GB |
256 GB |
512 GB |
Disk Storage (SSD) |
500 GB |
1 TB |
1 TB |
For cloud deployment, some examples of VMs are provided below; however, any VM with similar memory, CPU and storage capacities would be sufficient:
Tier 1 | Tier 2 | Tier 3 | |
---|---|---|---|
AWS |
m5.8xlarge |
m5.16xlarge |
r5a.16xlarge |
n1-standard-32 |
n1-standard-64 |
Ni-highmem-64 |
|
Azure |
Standard_D32s_v3 |
Standard_D64_v3 |
Standard-E64a_v4 |
Machine management capabilities
The application server can suspend or stop AI servers when idle so that resources are not wasted and operational costs are minimized.
The AI servers can also have different CPU, memory, and disk configurations. However, this release of MOSTLY AI will not be doing any active forwarding of tasks based on their processing capabilities.
Data source and destination configuration requirements
Databases
MOSTLY AI works with source and destination databases. These can be on the same server.
-
The databases should be accessible by the application and AI servers.
-
The source database can be read-only.
-
The destination database should be empty and have write access as synthetic data
will be written to it. -
The database user should have access privileges to create tables and write to the database and schema.
System administration
MOSTLY AI requires an administrative user for installation which has one of the following attributes:
-
member of the docker group, OR
-
has sudo rights for docker, OR
-
is root OR
-
has sudo rights
User access requirements
MOSTLY AI’s web UI
Users can operate MOSTLY AI using its web UI, which they can access using a web browser. Admins can configure a specific port and a domain name for user access. This port needs to be white-listed in the firewall settings, and the domain name needs to be certified.
The web UI can also be accessed via localhost or the IP address of the MOSTLY AI server.
Identity and Access Management
MOSTLY AI uses the Keycloak Identity and Access Management service to manage users and configure their access permissions. It is part of MOSTLY AI’s installation and needs its own port to be white-listed in the firewall settings.
Administrators can manage users within MOSTLY AI’s web UI by synchronizing its user database with your company’s Active Directory.
But they can also add users via Keycloak’s web UI, which is also accessed via the browser. It uses the same domain name that’s configured for MOSTLY AI, but with the /auth
path attached to it — https://mostlyai.mycompany.com/auth, for example.
Data access requirements
MOSTLY AI requires access to the data sources with which it will generate synthetic data.
For database data sources, please ensure that the cluster’s network configuration allows access to them, apart from credentials.
For CSV and Parquet files, you can grant access in the following ways:
-
Store them locally on the server where MOSTLY AI is running
-
Using a client that accesses the administrative console of MOSTLY AI
-
Store them in a cloud bucket on AWS, Google, or Azure
-
AWS S3
Requires Access Key and Secret Access Key of the user profile in AWS IAM. -
Google
Requires a service account with permission to operate within the storage bucket. -
Azure
Requires Access Key and Secret Access Key of the storage account.
-
License management
Activating or renewing the license doesn’t require internet access. MOSTLY AI’s synthetic data platform operates solely within your company’s intranet and has no interaction with the internet. We also don’t have any access to your usage or data.
To activate your company’s license, the administrator tasked with installing MOSTLY AI needs to contact your account manager.