Use Apache Hive for synthetic data

MOSTLY AI can use Apache Hive as a source for original data as well as a destination to deliver synthetic data. To do so, you need to create Apache Hive connectors.

Create an Apache Hive connector

For each Apache Hive data source or destination, you need a separate connector.

Prerequisites

Obtain the Apache Hive connection details. To use Kerberos, see Use Kerberos for authentication.

  • host
  • port
  • credentials

If you use the web application, create a new Apache Hive connector from the Connectors page.

Steps

  1. From the Connectors page, click New connector. Click Create connector button
  2. From the Create a new connector window, select Apache Hive. Select Apache Hive connector
  3. From the New connector page, configure the connector.
    1. For Name, enter a name that you can distinguish from other connectors.
    2. For Access type, select whether you want to use the connector as a source or destination.
    3. For Host, enter the Apache Hive hostname.
    4. For Port, enter the port number.

      The default port for Apache Hive is 10000.

    5. For Username and Password, enter your Apache Hive credentials. Configure Apache Hive connector
  4. Click Save to save your new Apache Hive connector.

    MOSTLY AI tests the connection. If you see an error, check the connection details, update them, and click Save again.

    You can click Save anyway to save the connector disregarding any errors.

Use Kerberos for authentication

To create an Apache Hive connector with Kerberos authentication, contact your Kerberos system administrator to obtain the information listed below.

  • Kerberos principal. A unique identity to which Kerberos can assign tickets.
  • Kerberos krb5.conf configuration file. The contents of a krb5.conf configuration file, such as the one listed below. For more information, see krb5.conf (opens in a new tab) in the MIT Kerberos Documentation.
    [libdefaults]
    default_realm = INTERNAL
    dns_lookup_realm = false
    dns_lookup_kdc = false
    forwardable = true
    rdns = false
    
    [realms]
    INTERNAL = {
    	kdc = ip-172-***-***-222
    	kdc = ip-172-***-***-222.hive-kerberized.domain.com
    	kdc = 172.***.***.222
    	kdc = 3.***.***.111
    	kdc = hive-kerberized.domain.com
    	admin_server = hive-kerberized.domain.com
    }
    
    [domain_realm]
    .hive-kerberized.domain.com = INTERNAL
    hive-kerberized.domain.com = INTERNAL
  • Kerberos keytab. Short for "key table", keytab files are used in Kerberos authentication to store keys needed to log in to Kerberos-aware services. Keytab files allow automated processes (scripts and service authentication) to authenticate using Kerberos without requiring a human to enter a password.

If you use the web application, enable the Authenticate with Kerberos checkbox in your Apache Hive connector and provide the required information.

Steps

  1. To use Kerberos for authentication in your Apache Hive connector, select the Authenticate with Kerberos checkbox.
  2. Provide the Kerberos authentication details.
    1. For Kerberos principal, type or paste the Kerberos principal.
    2. For Kerberos krb5 config, paste the contents of a krb5.conf file.
    3. For Kerberos keytab, click the Upload button and select a Kerberos keytab file from your local file system. Apache Hive connector configuration - Kerberos authentication
  3. Click Save.

What's next

Depending on whether you created a source or a destination connector, you can use the connector as: