Use BigQuery for synthetic data
With MOSTLY AI, you can connect to BigQuery and use it as a data source or destination for your synthetic data.
Prerequisites
Create and download a JSON key file for a Google Cloud service account.
Recommendations
If you plan to use BigQuery as both a data source and a destination, MOSTLY AI recommends the following best practices.
- Store original (production) and synthetic data in separate Google Cloud projects.
- Create two Google Cloud service accounts with specific permissions.
💡
For more information, see Control access to resources with IAM (opens in a new tab) in the BigQuery documentation.
- One service account should have the Viewer role in the project containing production data.
- The other service account should have the Editor role in the project where you deliver synthetic data.
- Create separate BigQuery connectors for the source and destination
- For the source connector, use the account key with the Viewer role for your production data project.
- For the destination connector, use the account key with the Editor role for your synthetic data project.
Download a Google Cloud service account key
- In Google Cloud BigQuery, open the main sidebar menu and select APIs & Services > Enabled APIs & services.
- From the sidebar, select Credentials.
- Click your service account.
- Select the KEYS tab.
- Click ADD KEY and select Create new key.
- In the prompt, select JSON and click Create.
Create a BigQuery connector
If you use the web application, create a new BigQuery connector from the Connectors page.
Steps
- From the Connectors tab, click New connector.
- On the Connect to database tab, select Google BigQuery.
- On the Create BigQuery connector page, configure the connector.
- For Name, enter a name that you can distinguish from other connectors.
- For Access type, select whether you want to use the connector as a source or destination.
- In Key file, paste the contents of your BigQuery key file.
- Click Save to save your new Databricks connector.
MOSTLY AI tests the connection. If you see an error, check the connection details, update them, and click Save again.
You can click Save anyway to save the connector disregarding any errors.
What's next
Depending on whether you created a source or a destination connector, you can use the connector as:
- data source for a new generator
- data destination for a new synthetic dataset