Setting Up SQL to Fetch Kafka Messages Using Apache Pinot
This document outlines the steps to set up SQL querying for Kafka messages using Apache Pinot and optionally integrating with Trino for advanced SQL queries. It also includes configuration for Kafka authentication using username and password with SHA-256.
1. Install Apache Pinot and Dependencies Using Podman
Pull the Apache Pinot Image
-
Pull the official Apache Pinot image:
podman pull apachepinot/pinot:latest
-
Verify the image:
podman images
Run Pinot Containers
Run Pinot components (Controller, Broker, Server, Zookeeper) using Podman:
-
Start Zookeeper
podman run -d \ --name pinot-zookeeper \ -p 2181:2181 \ zookeeper:3.5
-
Start Pinot Controller
podman run -d \ --name pinot-controller \ -p 9000:9000 \ --link pinot-zookeeper:zookeeper \ apachepinot/pinot:latest StartController -zkAddress zookeeper:2181
-
Start Pinot Broker
podman run -d \ --name pinot-broker \ -p 8099:8099 \ --link pinot-zookeeper:zookeeper \ apachepinot/pinot:latest StartBroker -zkAddress zookeeper:2181
-
Start Pinot Server
podman run -d \ --name pinot-server \ -p 8097:8097 \ --link pinot-zookeeper:zookeeper \ apachepinot/pinot:latest StartServer -zkAddress zookeeper:2181
-
Start Pinot Minion (Optional)
podman run -d \ --name pinot-minion \ --link pinot-zookeeper:zookeeper \ apachepinot/pinot:latest StartMinion -zkAddress zookeeper:2181
Verify Pinot Setup
- Access the Pinot Web UI at
http://localhost:9000
. - Ensure all components (Controller, Broker, Server, Zookeeper) are running.
2. Set Up Pinot for Kafka Integration
Prepare Pinot to Read Kafka Messages
-
Define the Schema Create a schema JSON file (
schema.json
) for your Kafka topic messages:{ "schemaName": "KafkaMessageSchema", "dimensionFieldSpecs": [ { "name": "field1", "dataType": "STRING" }, { "name": "field2", "dataType": "LONG" } ], "metricFieldSpecs": [ { "name": "metric1", "dataType": "DOUBLE" } ], "dateTimeFieldSpecs": [ { "name": "event_time", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ] }
-
Set Up Table Configuration Define a table configuration JSON (
table-config.json
) to link Kafka with Pinot:{ "tableName": "KafkaMessageTable", "tableType": "REALTIME", "segmentsConfig": { "replication": "1" }, "tableIndexConfig": { "loadMode": "MMAP" }, "ingestionConfig": { "streamIngestionConfig": { "streamConfigs": { "streamType": "kafka", "stream.kafka.broker.list": "localhost:9092", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "your-topic-name", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaProtobufMessageDecoder", "stream.kafka.consumer.security.protocol": "SASL_PLAINTEXT", "stream.kafka.consumer.sasl.mechanism": "SCRAM-SHA-256", "stream.kafka.consumer.sasl.jaas.config": "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"your-username\" password=\"your-password\";" } } } }
-
Protobuf Schema Configuration
- Create a
.desc
file from your Protobuf schema using theprotoc
compiler:protoc --descriptor_set_out=your-schema.desc --proto_path=path/to/your/proto/file your-file.proto
- Ensure the descriptor file is accessible in the Pinot container.
- Create a
-
Add the Schema and Table Run the following commands inside the Pinot container:
bin/pinot-admin.sh AddSchema -schemaFile /path/to/schema.json bin/pinot-admin.sh AddTable -tableConfigFile /path/to/table-config.json
3. Query Kafka Messages in Pinot
- Access the Pinot Web UI (
http://localhost:9000
) and verify the table is created. - Use the built-in query console to query Kafka messages:
SELECT field1, field2, event_time FROM KafkaMessageTable LIMIT 10;
4. Set Up Trino/Presto for Advanced SQL Queries (Optional)
Install Trino on Podman
- Pull the Trino Docker image:
podman pull trinodb/trino
- Run the Trino container:
podman run -d --name trino -p 8080:8080 trinodb/trino
Connect Trino to Pinot
- Create a
pinot.properties
file:connector.name=pinot pinot.controller-urls=http://pinot-controller:9000
- Mount this file to Trino and restart the container:
podman restart trino
Query Pinot Using Trino SQL
- Connect to Trino's web UI (
http://localhost:8080
). - Query Pinot tables:
SELECT field1, COUNT(*) FROM pinot.default.KafkaMessageTable GROUP BY field1;
Notes
- Ensure your Kafka setup is configured to accept SASL/SHA-256 authentication.
- Replace placeholders like
your-username
,your-password
, andyour-topic-name
with actual values. - For Protobuf messages, ensure the
KafkaProtobufMessageDecoder
is used, and the.desc
file is correctly linked in the Pinot configuration.
Let me know if additional steps or clarifications are needed!
No comments:
Post a Comment