Connect to Databricks
With MOSTLY AI, you can connect to a Databricks SQL Warehouse and use it as a data source or destination for your synthetic data.
Prerequisites
To create a Databricks connector, you need to obtain your SQL Warehouse connection details, a Databricks catalog name, and a personal access token for Databricks. The linked sections below provide step-by-step guidance on how to complete the prerequisites.
- Database connection details in Databricks
- Catalog name in Databricks
- Personal access token in Databricks
Get connection details for your Databricks SQL Warehouse
-
In Databricks, open the workspace that contains the SQL Warehouse you want to use.
-
Open the sidebar and from the main menu, select SQL.
-
Open the sidebar menu again and select SQL Warehouses.
-
From the list, open the SQL warehouse you want to use for synthetic data.
-
Select the Connection details tab.
-
Copy the necessary connection details (hostname, port, protocol, and HTTP path) for the MOSTLY AI Databricks connector.
Get Databricks catalog name
-
From the Databricks sidebar menu, select Data.
-
Copy the name of the catalog you want to use in MOSTLY AI.
Create a Databricks personal access token
-
In Databricks, open your account menu and select User Settings.
-
On the Access tokens tab, click Generate new token.
-
In the dialog window, enter a name that identifies where you intend to use the token.
[NOTE] Adjust the expiration of the token in the Lifetime (days) box.
-
Click Generate.
-
Copy the access token and save it in a secure location.
⚠️Do not close the dialog window before you make sure you save the token in a location that you can access later.
Create a Databricks connector
-
From the Connectors tab, click Create connector.
The Create connector drawer appears on the right.
-
On the Connect to database tab, select Databricks.
-
Configure the Databricks connector.
-
For Connector name, enter a name for the Databricks connector.
A combination of
Databricks
+_CATALOG_
might help you identify this connector among other Databricks connectors. -
For Connection type, select whether you want to use the connector as a source or destination.
You can select only data source connectors when you create a new catalog.
Similarly, you can select only data destination connectors when you configure a destination for the new synthetic dataset.
-
For Hostname, enter your SQL warehouse server hostname. For more information, see the Prerequisites above.
-
For HTTP path, enter your SQL warehouse HTTP path.
-
For Access token, enter your Databricks personal access token.
-
For Catalog, enter the name of your Databricks.
-
For Schema, enter the schema you want to use.
If you leave Schema empty, MOSTLY AI uses the default schema default.
-
-
Click Save.
Result
Your Databricks connector is now saved.
What's next
You can now use the Databricks connector as a data source when you create a new catalog.
You can also use the Databricks connector as a destination.
You can use different types of data sources and destinations for a synthetic dataset. For example, if your data source is a Databricks database, you can deliver the generated synthetic to any of the supported databases or cloud storage providers.