Use S3 storage for synthetic data
If you keep datasets in S3 storage (Amazon S3 or any S3-compatible storage service), you can synthesize them via an S3 connector in MOSTLY AI.
If you want to store the generated synthetic data in a S3 separate bucket, you need to create a second destination S3 connector that points to that bucket.
Prerequisites
When you use AWS S3, take into account the prerequisites listed below.
- Use only "long-term" credentials that include an access key and a secret key. "Short-term" credentials also require a session token, which is not supported.
- To use AWS S3 paths containing partitioned Parquet datasets, your AWS credentials must have the
s3:ListBucket
permission.
Steps
- From the Connectors tab, click Create connector.
- From the Create a new connector window, select S3 Storage.
- From the New connector window, configure the connector.
- For Name, enter a name that you can distinguish from other connectors.
- For Access type, select whether you want to use the connector as a source or destination.
- For Access key, enter your AWS access key.
- For Secret key, enter your AWS secret key.
- For Endpoint URL (optional), enter the endpoint URL of your S3-compatible storage service.
💡
If you use Amazon S3, you can leave this field empty.
If you use a different S3-compatible storage service, enter the endpoint URL of the service. For example:https://play.min.io:9000
. - (Optional) To use an encrypted connection, select Use SSL and upload your certificate in the CA certificate field.
- Click Save to save your new AWS storage connector.
MOSTLY AI tests the connection. If you see an error, check the connection details, update them, and click Save again.
You can click Save anyway to save the connector disregarding any errors.
What's next
Depending on whether you created a source or a destination connector, you can use the connector as:
- data source for a new generator
- data destination for a new synthetic dataset