Skip to main content
Skip to main content

Integrating Amazon Glue with ClickHouse and Spark

Amazon Glue is a fully managed, serverless data integration service provided by Amazon Web Services (AWS). It simplifies the process of discovering, preparing, and transforming data for analytics, machine learning, and application development.

Installation

To integrate your Glue code with ClickHouse, you can use our official Spark connector in Glue via one of the following:

  • Installing the ClickHouse Glue connector from the AWS Marketplace (recommended).
  • Manually adding the Spark Connector's jars to your Glue job.
  1. Subscribe to the Connector

    To access the connector in your account, subscribe to the ClickHouse AWS Glue Connector from AWS Marketplace.

  2. Grant Required Permissions

    Ensure your Glue job’s IAM role has the necessary permissions, as described in the minimum privileges guide.

  3. Activate the Connector & Create a Connection

    You can activate the connector and create a connection directly by clicking this link, which opens the Glue connection creation page with key fields pre-filled. Give the connection a name, and press create (no need to provide the ClickHouse connection details at this stage).

  4. Use in Glue Job

    In your Glue job, select the Job details tab, and expend the Advanced properties window. Under the Connections section, select the connection you just created. The connector automatically injects the required JARs into the job runtime.

Glue Notebook connections config
Note

The JARs used in the Glue connector are built for Spark 3.2, Scala 2, and Python 3. Make sure to select these versions when configuring your Glue job.

Examples

For more details, please visit our Spark documentation.