Thursday, 1 August 2024

trino in podman

 

Setting Up SQL to Fetch Kafka Messages Using Apache Pinot

This document outlines the steps to set up SQL querying for Kafka messages using Apache Pinot and optionally integrating with Trino for advanced SQL queries. It also includes configuration for Kafka authentication using username and password with SHA-256.


1. Install Apache Pinot and Dependencies Using Podman

Pull the Apache Pinot Image

  1. Pull the official Apache Pinot image:

    podman pull apachepinot/pinot:latest
    
  2. Verify the image:

    podman images
    

Run Pinot Containers

Run Pinot components (Controller, Broker, Server, Zookeeper) using Podman:

  1. Start Zookeeper

    podman run -d \
      --name pinot-zookeeper \
      -p 2181:2181 \
      zookeeper:3.5
    
  2. Start Pinot Controller

    podman run -d \
      --name pinot-controller \
      -p 9000:9000 \
      --link pinot-zookeeper:zookeeper \
      apachepinot/pinot:latest StartController -zkAddress zookeeper:2181
    
  3. Start Pinot Broker

    podman run -d \
      --name pinot-broker \
      -p 8099:8099 \
      --link pinot-zookeeper:zookeeper \
      apachepinot/pinot:latest StartBroker -zkAddress zookeeper:2181
    
  4. Start Pinot Server

    podman run -d \
      --name pinot-server \
      -p 8097:8097 \
      --link pinot-zookeeper:zookeeper \
      apachepinot/pinot:latest StartServer -zkAddress zookeeper:2181
    
  5. Start Pinot Minion (Optional)

    podman run -d \
      --name pinot-minion \
      --link pinot-zookeeper:zookeeper \
      apachepinot/pinot:latest StartMinion -zkAddress zookeeper:2181
    

Verify Pinot Setup

  1. Access the Pinot Web UI at http://localhost:9000.
  2. Ensure all components (Controller, Broker, Server, Zookeeper) are running.

2. Set Up Pinot for Kafka Integration

Prepare Pinot to Read Kafka Messages

  1. Define the Schema Create a schema JSON file (schema.json) for your Kafka topic messages:

    {
      "schemaName": "KafkaMessageSchema",
      "dimensionFieldSpecs": [
        { "name": "field1", "dataType": "STRING" },
        { "name": "field2", "dataType": "LONG" }
      ],
      "metricFieldSpecs": [
        { "name": "metric1", "dataType": "DOUBLE" }
      ],
      "dateTimeFieldSpecs": [
        {
          "name": "event_time",
          "dataType": "LONG",
          "format": "1:MILLISECONDS:EPOCH",
          "granularity": "1:MILLISECONDS"
        }
      ]
    }
    
  2. Set Up Table Configuration Define a table configuration JSON (table-config.json) to link Kafka with Pinot:

    {
      "tableName": "KafkaMessageTable",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "replication": "1"
      },
      "tableIndexConfig": {
        "loadMode": "MMAP"
      },
      "ingestionConfig": {
        "streamIngestionConfig": {
          "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.broker.list": "localhost:9092",
            "stream.kafka.consumer.type": "lowlevel",
            "stream.kafka.topic.name": "your-topic-name",
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaProtobufMessageDecoder",
            "stream.kafka.consumer.security.protocol": "SASL_PLAINTEXT",
            "stream.kafka.consumer.sasl.mechanism": "SCRAM-SHA-256",
            "stream.kafka.consumer.sasl.jaas.config": "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"your-username\" password=\"your-password\";"
          }
        }
      }
    }
    
  3. Protobuf Schema Configuration

    • Create a .desc file from your Protobuf schema using the protoc compiler:
      protoc --descriptor_set_out=your-schema.desc --proto_path=path/to/your/proto/file your-file.proto
      
    • Ensure the descriptor file is accessible in the Pinot container.
  4. Add the Schema and Table Run the following commands inside the Pinot container:

    bin/pinot-admin.sh AddSchema -schemaFile /path/to/schema.json
    bin/pinot-admin.sh AddTable -tableConfigFile /path/to/table-config.json
    

3. Query Kafka Messages in Pinot

  1. Access the Pinot Web UI (http://localhost:9000) and verify the table is created.
  2. Use the built-in query console to query Kafka messages:
    SELECT field1, field2, event_time 
    FROM KafkaMessageTable 
    LIMIT 10;
    

4. Set Up Trino/Presto for Advanced SQL Queries (Optional)

Install Trino on Podman

  1. Pull the Trino Docker image:
    podman pull trinodb/trino
    
  2. Run the Trino container:
    podman run -d --name trino -p 8080:8080 trinodb/trino
    

Connect Trino to Pinot

  1. Create a pinot.properties file:
    connector.name=pinot
    pinot.controller-urls=http://pinot-controller:9000
    
  2. Mount this file to Trino and restart the container:
    podman restart trino
    

Query Pinot Using Trino SQL

  1. Connect to Trino's web UI (http://localhost:8080).
  2. Query Pinot tables:
    SELECT field1, COUNT(*) 
    FROM pinot.default.KafkaMessageTable 
    GROUP BY field1;
    

Notes

  • Ensure your Kafka setup is configured to accept SASL/SHA-256 authentication.
  • Replace placeholders like your-username, your-password, and your-topic-name with actual values.
  • For Protobuf messages, ensure the KafkaProtobufMessageDecoder is used, and the .desc file is correctly linked in the Pinot configuration.

Let me know if additional steps or clarifications are needed!

No comments:

Post a Comment