Synthesize a cloud bucket dataset

Synthesize a cloud bucket dataset

Datasets are often shared and made available for team members or other collaborators through cloud object storage. MOSTLY AI integrates with cloud object storage providers and you can use uploaded datasets as a data source for synthetic data.

Steps

  1. Create a connector for your cloud bucket.

    In a cloud storage connector, you define the connection details and credentials to access files in your cloud buckets. In the list below, you can find instructions to create a connector to one of the supported cloud object storage providers.
  2. Create a cloud storage catalog.
  3. With the cloud storage catalog open, click Next. Get started with cloud storage datasets - click Next
  4. (Optional) On the Synthetic datasets / Start job screen, review the synthetic dataset configuration.

    For more information, see Configure a synthetic dataset.

  5. Configure a data destination.
    1. Select Output settings.
    2. For Data destination, select Download as CSV/Parquet.
      💡

      Tip
      If you want to deliver the generated synthetic dataset to the same or another cloud bucket (or even a database), see Configure a data destination.

  6. To start the synthetic dataset, click Create a synthetic dataset. Get started with cloud storage datasets - click Create a synthetic dataset

Result

The Synthetic datasets tab opens where you can track the progress of generating a synthetic dataset from a cloud storage bucket.

Get started with cloud storage datasets - Track progress of synthetic dataset

What's next

After it completes, you can preview and download the generated synthetic dataset.