Monday, 17 February 2025

Trino with Redpanda

 Step 1: Prepare Installation Files on a Machine with Internet Access

  1. Download Trino Server

    • Visit Trino Releases and download the latest Trino server .tar.gz file.
    • Example (for version 433):
      wget https://repo1.maven.org/maven2/io/trino/trino-server/433/trino-server-433.tar.gz
      
  2. Download Trino CLI

    • Example:
      wget https://repo1.maven.org/maven2/io/trino/trino-cli/433/trino-cli-433-executable.jar -O trino
      chmod +x trino
      
  3. Download Required Connectors

    • Download the Kafka connector:
      wget https://repo1.maven.org/maven2/io/trino/trino-kafka/433/trino-kafka-433.tar.gz
      
    • Download the Protobuf decoder:
      wget https://repo1.maven.org/maven2/io/trino/trino-kafka-protobuf-decoder/433/trino-kafka-protobuf-decoder-433.tar.gz
      
  4. Transfer the Files to AWS EC2 Instance

    • Use scp or SFTP to copy the downloaded files to the EC2 instance.

Step 2: Install and Configure Trino on AWS EC2

  1. Extract Trino

    tar -xvzf trino-server-433.tar.gz
    mv trino-server-433 /opt/trino
    
  2. Create Trino Configuration Directories

    mkdir -p /opt/trino/etc
    
  3. Create config.properties

    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8080
    discovery-server.enabled=true
    discovery.uri=http://localhost:8080
    
  4. Create node.properties

    node.environment=production
    node.id=trino-node-1
    node.data-dir=/var/trino/data
    
  5. Create jvm.config

    -server
    -Xmx4G
    -XX:+UseG1GC
    -XX:+ExplicitGCInvokesConcurrent
    
  6. Configure Kafka Connector for Redpanda

    mkdir -p /opt/trino/etc/catalog
    
    • Create /opt/trino/etc/catalog/kafka.properties:
      connector.name=kafka
      kafka.nodes=<Redpanda_Broker>:9092
      kafka.table-names-matching=.*
      kafka.default-schema=default
      kafka.messages-format=raw
      
  7. Move Downloaded Connectors

    tar -xvzf trino-kafka-433.tar.gz -C /opt/trino/plugin
    tar -xvzf trino-kafka-protobuf-decoder-433.tar.gz -C /opt/trino/plugin
    

Step 3: Set Up Local Schema Registry

  1. Download and Transfer Schema Registry

    wget https://packages.confluent.io/archive/7.5/confluent-community-7.5.0.tar.gz
    
    • Transfer this file to the EC2 instance.
  2. Install Schema Registry on EC2

    tar -xvzf confluent-community-7.5.0.tar.gz
    mv confluent-7.5.0 /opt/schema-registry
    
  3. Configure Schema Registry

    • Create /opt/schema-registry/etc/schema-registry/schema-registry.properties:
      listeners=http://0.0.0.0:8081
      kafkastore.bootstrap.servers=<Redpanda_Broker>:9092
      kafkastore.topic=_schemas
      debug=false
      
  4. Start Schema Registry

    /opt/schema-registry/bin/schema-registry-start /opt/schema-registry/etc/schema-registry/schema-registry.properties &
    
  5. Verify Schema Registry

    curl http://localhost:8081/subjects
    

Step 4: Configure Trino to Use Local Schema Registry

  1. Update /opt/trino/etc/catalog/kafka.properties

    connector.name=kafka
    kafka.nodes=<Redpanda_Broker>:9092
    kafka.default-schema=default
    kafka.messages-format=protobuf
    kafka.schema-registry-url=http://localhost:8081
    
  2. Restart Trino

    /opt/trino/bin/launcher restart
    
  3. Test Query on Redpanda

    ./trino --server http://localhost:8080 --catalog kafka --schema default
    SELECT * FROM kafka.your_protobuf_topic LIMIT 10;
    

Now, Trino should be connected to Redpanda, allowing SQL queries on your Protobuf topic.