MOSTLY AI Python SDK
The MOSTLY AI Python SDK enables full programmatic use of the MOSTLY AI Platform features both in a local environment as well as by connecting to a remote MOSTLY AI Platform.
Intent | Primitive |
---|---|
Train a generator on tabular or language data | g = mostly.train(config) |
Generate any number of synthetic data records | sd = mostly.generate(g, config) |
Live probe the generator on demand | df = mostly.probe(g, config) |
Connect to any data source within your org | c = mostly.connect(config) |
For a complete API reference, see the Python SDK package documentation.
Installation
Use pip install
to install the latest version of mostlyai
.
# CPU
pip install -U "mostlyai[local]"
# GPU
pip install -U "mostlyai[local-gpu]"
GPU support in Local mode is available on Linux only.
If you need to use one of the supported connectors in Local mode, install with the optional dependencies: databricks
, googlebigquery
, hive
, mssql
, mysql
, oracle
, postgres
, snowflake
.
pip install -U "mostlyai[local,databricks]"
Local and Client modes
The Python SDK is designed to work in a local environment (your computer or any supported Python environment) or by connecting to a remote MOSTLY AI Platform (such as https://app.mostly.ai). See the comparison below.
Local mode | Client mode | |
---|---|---|
Prerequisites | Local Python installation (in Local mode) | • Remote MOSTLY AI Platform • Platform API key • Local Python installation (in Client mode) |
Installation | Install the Python SDK in Local mode | 1. Deploy MOSTLY AI Platform in a Kubernetes cluster 2. Connect to the Platform with Python SDK in Client mode |
Service | Use a locally running server that provides the REST API | Connect to the MOSTLY AI platform REST API |
Compute | Uses local compute resources (CPU, GPU) | Uses compute resources available on the MOSTLY AI Platform |
The same API is available in Local and Client modes. The only difference is how you instantiate the MostlyAI
client depending on the mode you need.
from mostlyai.sdk import MostlyAI
mostly = MostlyAI(local=True)
Get an API key
Get your API key for the REST API or Python SDK from your user profile menu in the web application.
Steps
- Hover over the profile menu in the upper right and select API key.
- Click Generate API Key.
What’s next
Your key is immediately copied to your clipboard. You can now use it for the REST API or to instantiate your Python SDK in Client mode.
Examples
As you explore the Generators, Synthetic datasets, and Connectors pages, you will find Python code snippets that show how to accomplish a task with the Python SDK. Use the UI tab for the UI steps in the MOSTLY AI Platform and the Python SDK tab to switch between UI steps in the Platform and how to accomplish the same with the Python SDK.
Quick start
Use the Python SDK quick start below to train a generator locally on a tabular dataset, probe it, generate synthetic data, and export it to a file. Then, import the generator into a remote MOSTLY AI platform.
import pandas as pd
from mostlyai.sdk import MostlyAI
# initialize client (locally or remotely)
mostly = MostlyAI(local=True)
mostly_remote = MostlyAI(
api_key='INSERT_YOUR_API_KEY', # or set env var `MOSTLYAI_API_KEY`
base_url='https://app.mostly.ai' # or set env var `MOSTLYAI_BASE_URL`
)
# train a generator
df = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/census/census.csv.gz')
g = mostly.train(data=df)
# probe for some samples
syn = mostly.probe(g, size=10)
# generate a synthetic dataset
sd = mostly.generate(g, size=2_000)
# start using it
sd.data()
# export a local generator
g.export_to_file('generator_census.zip')
# import into a remote platform
g_remote = mostly_remote.generators.import_from_file('path/to/generator_census.zip')