Thursday, 1 August 2024

Trino on linux

 

Installing Trino on AWS Linux EC2 Without Internet and Configuring Kafka Connector


1. Prepare the AWS Linux EC2 Instance

  1. Access the EC2 Instance:

    • Connect to your AWS Linux EC2 instance via SSH:
      ssh -i your-key.pem ec2-user@your-ec2-public-ip
      
  2. Check System Details:

    • Confirm the Linux version (e.g., Amazon Linux 2):
      cat /etc/os-release
      
  3. Ensure Required Ports Are Open:

    • Update the security group to allow access to Trino's default port (8080).

2. Download Required Files Locally

Since the EC2 instance does not have internet access, you need to download files locally and transfer them:

  1. Download Trino Tarball:

  2. Download Required Dependencies:

  3. Prepare Kafka Connector Configuration:

    • Create a kafka.properties file locally with the following content:
      connector.name=kafka
      kafka.nodes=localhost:9092
      kafka.table-names-mapping=default
      message-format=protobuf
      
  4. Transfer Files to EC2:

    • Use scp to copy files to the EC2 instance:
      scp -i your-key.pem trino-server-414.tar.gz kafka.properties openjdk-11.0.18_linux-x64_bin.tar.gz ec2-user@your-ec2-public-ip:/home/ec2-user
      

3. Install Trino on the EC2 Instance

  1. Install Java:

    • Extract and install OpenJDK:
      tar -xzf openjdk-11.0.18_linux-x64_bin.tar.gz -C /usr/local/
      export JAVA_HOME=/usr/local/jdk-11.0.18
      export PATH=$JAVA_HOME/bin:$PATH
      
    • Verify Java installation:
      java -version
      
  2. Install Trino:

    • Extract the Trino tarball:
      tar -xzf trino-server-414.tar.gz -C /opt/
      mv /opt/trino-* /opt/trino
      
  3. Configure Trino:

    • Create configuration directories:

      mkdir -p /opt/trino/etc/catalog
      
    • Add Kafka configuration to the catalog:

      mv kafka.properties /opt/trino/etc/catalog/
      
    • Create config.properties for Trino in /opt/trino/etc:

      coordinator=true
      node-scheduler.include-coordinator=true
      http-server.http.port=8080
      discovery.uri=http://localhost:8080
      
    • Create node.properties in /opt/trino/etc:

      node.environment=production
      node.id=unique-node-id
      node.data-dir=/opt/trino/data
      
  4. Start Trino:

    • Run Trino in the background:
      /opt/trino/bin/launcher start
      
  5. Verify Trino:

    • Confirm Trino is running by accessing http://localhost:8080 on the instance.

4. Configure Kafka Connector

  1. Prepare Kafka:

    • Ensure Kafka is set up and running on your EC2 instance or another accessible node.
    • If Kafka requires authentication, update the kafka.properties file:
      kafka.security.protocol=SASL_PLAINTEXT
      kafka.sasl.mechanism=PLAIN
      kafka.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="your_username" password="your_password";
      
  2. Test the Kafka Connector:

    • Access the Trino CLI:
      /opt/trino/bin/cli
      
    • List catalogs:
      SHOW CATALOGS;
      
    • Create a schema for a Kafka topic:
      CREATE SCHEMA kafka.test_topic WITH (
        format = 'protobuf',
        topic = 'test_topic'
      );
      
    • Query data:
      SELECT * FROM kafka.test_topic;
      

5. Additional Steps for Maintenance

  • Automate Trino Start:

    • Create a systemd service to start Trino on boot:
      sudo nano /etc/systemd/system/trino.service
      
      Add the following content:
      [Unit]
      Description=Trino Service
      After=network.target
      
      [Service]
      User=ec2-user
      ExecStart=/opt/trino/bin/launcher start
      ExecStop=/opt/trino/bin/launcher stop
      Restart=always
      
      [Install]
      WantedBy=multi-user.target
      
    • Enable and start the service:
      sudo systemctl enable trino
      sudo systemctl start trino
      
  • Monitor Logs:

    • Logs are stored in /opt/trino/var/log. Use tail to monitor:
      tail -f /opt/trino/var/log/server.log
      

This guide ensures Trino is set up on an AWS Linux EC2 instance without internet access, with Kafka integration fully configured for Protobuf message format.

No comments:

Post a Comment