Tuesday, 3 December 2024

java code with proto and Gradle

 

Kafka Protobuf Consumer

This project demonstrates how to consume messages from a Kafka topic using a Java application. The messages are serialized using Protocol Buffers (Protobuf) and deserialized in the application.

Prerequisites

  1. Java Development Kit (JDK): Install JDK 11 or later.
  2. Apache Kafka: Ensure you have access to a Kafka cluster.
  3. Protocol Buffers Compiler (protoc):

bash

 

protoc --version

  1. Maven or Gradle: Install Maven or Gradle to manage project dependencies.

Setup Instructions

Step 1: Clone or Download the Project

bash

 

git clone <repository-url>

cd KafkaProtobufConsumer


Step 2: Compile Protobuf File

  1. Place your .proto file (e.g., PPP_Cloud_Streaming.proto) in the src/main/proto directory.
  2. Use protoc to generate Java files:

bash

 

protoc --java_out=src/main/java src/main/proto/PPP_Cloud_Streaming.proto

  1. The generated Java files will be placed in the src/main/java directory under the specified package.

Step 3: Configure Project Dependencies

Maven

Add the following dependencies to your pom.xml:

xml

 

<dependencies>

    <!-- Kafka Client -->

    <dependency>

        <groupId>org.apache.kafka</groupId>

        <artifactId>kafka-clients</artifactId>

        <version>3.5.1</version>

    </dependency>

    <!-- Protocol Buffers -->

    <dependency>

        <groupId>com.google.protobuf</groupId>

        <artifactId>protobuf-java</artifactId>

        <version>3.24.0</version>

    </dependency>

</dependencies>

Gradle

Add the following dependencies to your build.gradle file:

gradle

 

dependencies {

    implementation 'org.apache.kafka:kafka-clients:3.5.1'

    implementation 'com.google.protobuf:protobuf-java:3.24.0'

}


Step 4: Configure Kafka Properties

Update the Kafka properties in the Java code as per your setup:

  • Bootstrap servers: Replace with your Kafka brokers.
  • Authentication: Configure SASL_PLAINTEXT, username, and password.

Example:

java

 

props.setProperty("bootstrap.servers", "broker.lt.use1.:9094");

props.setProperty("sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"SCRAM\" password=\"password\";");


Step 5: Build the Project

Maven

bash

 

mvn clean install

Gradle

bash

 

gradle build


Step 6: Run the Application

Using Maven:

bash

 

mvn exec:java -Dexec.mainClass="KafkaProtobufConsumer"

Using Gradle:

bash

 

gradle run


Expected Output

  1. The application connects to the Kafka topic specified in the code.
  2. It continuously polls messages from the topic.
  3. For each message, it deserializes the Protobuf payload and prints the details.

Example output:

css

 

Received message: { sourceTime: 123456789, tradeId: "T12345" }

Received message: { sourceTime: 987654321, tradeId: "T54321" }


Troubleshooting

  1. Error: protoc not found
    • Ensure protoc is installed and available in your PATH.
    • Verify with protoc --version.
  2. Dependency errors:
    • Run mvn dependency:resolve (Maven) or gradle dependencies (Gradle) to verify dependencies.
  3. Connection issues:
    • Check Kafka broker details and ensure your system has network access to the brokers.
  4. Protobuf deserialization fails:
    • Ensure the .proto file used to generate the Java classes matches the producer's .proto.

 

Installing Gradle

 

Installing Gradle

On Linux (Ubuntu/Debian)

  1. Install Gradle using APT:

bash

 

sudo apt update

sudo apt install gradle

  1. Verify Installation:

bash

 

gradle -v

  1. Manual Installation (Optional):
    • Download Gradle from the Gradle Download Page.
    • Extract the archive:

bash

 

unzip gradle-X.X.X-bin.zip

    • Move it to /opt:

bash

 

sudo mv gradle-X.X.X /opt/gradle

    • Add Gradle to your PATH:

bash

 

echo 'export PATH=/opt/gradle/bin:$PATH' >> ~/.bashrc

source ~/.bashrc


On Windows

  1. Download Gradle:
    • Visit the Gradle download page and download the binary .zip file.
  2. Extract Gradle:
    • Extract the archive to a directory (e.g., C:\Program Files\Gradle).
  3. Set Environment Variables:
    • Go to Control Panel > System > Advanced System Settings > Environment Variables.
    • Add a new system variable:
      • Name: GRADLE_HOME
      • Value: C:\Program Files\Gradle
    • Edit the Path variable and add:

plaintext

 

C:\Program Files\Gradle\bin

  1. Verify Installation: Open a new terminal and run:

bash

 

gradle -v


On macOS

  1. Install Gradle with Homebrew:

bash

 

brew install gradle

  1. Verify Installation:

bash

 

gradle -v

 

Monday, 25 November 2024

Check that when my python code is producing data, which network interface it is using

import psutil

from scapy.all import sniff, IP, TCP, raw

import logging

 

# Logging setup

logging.basicConfig(

    filename="network_interface_usage.log",  # Log file

    level=logging.INFO,

    format="%(asctime)s - %(message)s",

    datefmt="%Y-%m-%d %H:%M:%S"

)

 

def log_and_print(message):

    """Logs the message to a file and prints it to the console."""

    print(message)

    logging.info(message)

 

def get_network_interface():

    """Returns a list of network interfaces and their IP addresses."""

    interfaces = psutil.net_if_addrs()

    interface_info = {}

    for interface, addrs in interfaces.items():

        for addr in addrs:

            if addr.family == psutil.AF_INET:  # Filter for IPv4 addresses

                interface_info[interface] = addr.address

    return interface_info

 

def packet_callback(packet):

    """Callback function to process captured packets."""

    if IP in packet and TCP in packet:

        ip_src = packet[IP].src

        ip_dst = packet[IP].dst

        tcp_sport = packet[TCP].sport

        tcp_dport = packet[TCP].dport

 

        # Log packet details

        log_and_print(f"Packet from {ip_src}:{tcp_sport} -> {ip_dst}:{tcp_dport}")

        log_and_print(f"  Raw Packet Data: {raw(packet).hex()}")

        log_and_print("-" * 50)

 

def capture_packets(interface):

    """Start sniffing packets on a specific interface."""

    log_and_print(f"Starting packet capture on {interface}...")

    sniff(iface=interface, prn=packet_callback, store=False)

 

def main():

    # Get all network interfaces and IPs

    interfaces = get_network_interface()

    log_and_print("Detected Network Interfaces and IPs:")

    for interface, ip in interfaces.items():

        log_and_print(f"Interface: {interface} - IP: {ip}")

 

    # Capture packets on each interface (you can choose one based on your setup)

    for interface in interfaces.keys():

        capture_packets(interface)

 

if __name__ == "__main__":

    main()


Sunday, 3 November 2024

ClodFormation, terraform and CDK example

AWSTemplateFormatVersion: "2010-09-09"

Description: "CloudFormation Template for VPC Endpoints and Route 53 with VPC and Subnets as parameters."

 

Parameters:

  SelectedRegion:

    Description: "Select the region for deployment (us-east-1 or ap-east-1)."

    Type: String

    AllowedValues:

      - us-east-1

      - ap-east-1

    Default: us-east-1

 

  VPCID:

    Description: "The VPC ID where the resources will be deployed."

    Type: String

 

  Subnet1ID:

    Description: "The ID of the first subnet."

    Type: String

 

  Subnet2ID:

    Description: "The ID of the second subnet."

    Type: String

 

  Subnet3ID:

    Description: "The ID of the third subnet."

    Type: String

 

Resources:

  # VPC Endpoints

  VPCEndpointS3:

    Type: AWS::EC2::VPCEndpoint

    Properties:

      VpcId: !Ref VPCID

      ServiceName: !Sub "com.amazonaws.${SelectedRegion}.s3"

      VpcEndpointType: Gateway

 

  VPCEndpointEC2:

    Type: AWS::EC2::VPCEndpoint

    Properties:

      VpcId: !Ref VPCID

      ServiceName: !Sub "com.amazonaws.${SelectedRegion}.ec2"

      VpcEndpointType: Interface

      SubnetIds:

        - !Ref Subnet1ID

        - !Ref Subnet2ID

        - !Ref Subnet3ID

 

  VPCEndpointSSM:

    Type: AWS::EC2::VPCEndpoint

    Properties:

      VpcId: !Ref VPCID

      ServiceName: !Sub "com.amazonaws.${SelectedRegion}.ssm"

      VpcEndpointType: Interface

      SubnetIds:

        - !Ref Subnet1ID

        - !Ref Subnet2ID

        - !Ref Subnet3ID

 

  VPCEndpointSecretsManager:

    Type: AWS::EC2::VPCEndpoint

    Properties:

      VpcId: !Ref VPCID

      ServiceName: !Sub "com.amazonaws.${SelectedRegion}.secretsmanager"

      VpcEndpointType: Interface

      SubnetIds:

        - !Ref Subnet1ID

        - !Ref Subnet2ID

        - !Ref Subnet3ID

 

  VPCEndpointCloudWatchLogs:

    Type: AWS::EC2::VPCEndpoint

    Properties:

      VpcId: !Ref VPCID

      ServiceName: !Sub "com.amazonaws.${SelectedRegion}.logs"

      VpcEndpointType: Interface

      SubnetIds:

        - !Ref Subnet1ID

        - !Ref Subnet2ID

        - !Ref Subnet3ID

 

  # Route 53 Hosted Zone

  Route53HostedZone:

    Type: AWS::Route53::HostedZone

    Properties:

      Name: !Sub "example-${SelectedRegion}.com"

 

  # Route 53 Record Set

  Route53RecordSet:

    Type: AWS::Route53::RecordSet

    Properties:

      HostedZoneId: !Ref Route53HostedZone

      Name: "app.example.com."

      Type: A

      AliasTarget:

        DNSName: !Sub "vpce.${SelectedRegion}.amazonaws.com"

        HostedZoneId: !GetAtt Route53HostedZone.Id

 

Outputs:

  SelectedRegionOutput:

    Description: "The selected region."

    Value: !Ref SelectedRegion

 

  VPCIDOutput:

    Description: "The VPC ID used for this deployment."

    Value: !Ref VPCID

 

  Subnet1IDOutput:

    Description: "The Subnet ID for Subnet 1."

    Value: !Ref Subnet1ID

 

  Subnet2IDOutput:

    Description: "The Subnet ID for Subnet 2."

    Value: !Ref Subnet2ID

 

  Subnet3IDOutput:

    Description: "The Subnet ID for Subnet 3."

    Value: !Ref Subnet3ID

 

  HostedZoneIDOutput:

    Description: "The Route 53 Hosted Zone ID."

    Value: !Ref Route53HostedZone

Sunday, 27 October 2024

Python virtual environment in Windows and copying it to a server

1. Set Up a Python Virtual Environment on Windows (CMD Compatible)

  1. Install Python (if not already installed).
  2. Open Command Prompt and Navigate to Your Project Directory:

cmd

 cd path\to\your\project

  1. Create a Virtual Environment:

cmd

 python -m venv venv

  1. Activate the Virtual Environment:

In CMD, use:

cmd

 venv\Scripts\activate

If using PowerShell, the command would be slightly different:

powershell

 .\venv\Scripts\Activate.ps1

  1. Install Dependencies:

cmd

 pip install -r requirements.txt

2. Copy the Virtual Environment to the Server

Since Windows CMD does not support native tar and scp commands, you’ll need some workarounds:

  1. Compress the Virtual Environment Using a Tool Like 7-Zip:
    • Right-click on the venv folder and compress it into a .zip file using 7-Zip or a similar tool.
    • Name the file venv.zip.
  2. Transfer the Archive to the Server:

Use an FTP client (e.g., FileZilla) or, if you have installed the Windows Subsystem for Linux (WSL), you can use scp in a WSL terminal:

bash

 scp venv.zip user@server_ip:/path/to/server/directory

  1. Decompress on the Server:

Log into your server and navigate to the directory where you copied venv.zip, then unzip it:

bash

 unzip venv.zip

  1. Activate the Virtual Environment on the Server:

bash

 source /path/to/server/directory/venv/bin/activate

  1. Verify Dependencies:

Run pip freeze to confirm all required packages are present and install any missing ones if needed.

 


Saturday, 19 October 2024

 How to install rpk and test latency? 


Since you have Redpanda installed on AWS Linux, you can proceed with the steps mentioned earlier to test latency and throughput directly on your AWS setup. Here's a more tailored guide for your environment:

1. Install rpk on AWS Linux (if not already installed)

To ensure that you have rpk (Redpanda's CLI), you can install it by running:

bash

 

curl -LO https://packages.vectorized.io/rpk/ubuntu_20.04/amd64/latest/rpk.tar.gz

tar -xzvf rpk.tar.gz

sudo mv rpk /usr/local/bin/

Ensure that Redpanda is running before proceeding with tests:

bash

 

sudo systemctl start redpanda

2. Testing Throughput on AWS Linux

a) Using rpk

Test producer and consumer throughput directly:

  • Producer Throughput Test:

bash

 

rpk topic produce --brokers localhost:9092 --key test-key --value test-value -n 10000 --rate 500

    • -n 10000: Send 10,000 messages.
    • --rate 500: Produce at 500 messages per second.
  • Consumer Throughput Test: Consume messages and observe processing rates:

bash

 

rpk topic consume test-topic --offset oldest --num 10000

This will consume 10,000 messages from the topic and provide throughput results.

b) Using Kafka Tools (if needed)

If you have Kafka tools installed, you can use them for detailed throughput benchmarking.

  • Producer Throughput (Kafka):

bash

 

kafka-producer-perf-test.sh \

    --topic test-topic \

    --num-records 100000 \

    --record-size 1024 \

    --throughput -1 \

    --producer-props bootstrap.servers=localhost:9092

  • Consumer Throughput (Kafka):

bash

 

kafka-consumer-perf-test.sh \

    --broker-list localhost:9092 \

    --topic test-topic \

    --messages 100000

This will consume 100,000 messages from the topic and report the throughput.

3. Testing Latency on AWS Linux

a) Using rpk

For latency, you can use rpk to measure how fast Redpanda is processing your messages.

  • Producer Latency:

bash

 

rpk topic produce --brokers localhost:9092 --key test-key --value test-value -n 10000 --latency

This will measure the time it takes to deliver each message to the broker.

  • End-to-End Latency:
    1. Produce 10,000 messages:

bash

 

rpk topic produce test-topic -n 10000 --rate 100 --value "Message with latency test"

    1. At the same time, consume the messages:

bash

 

rpk topic consume test-topic --offset oldest

By comparing timestamps of message production and consumption, you can calculate the end-to-end latency.

4. Benchmarking Multiple Brokers (if applicable)

If your Redpanda cluster has multiple brokers, you can test performance across the cluster by producing to and consuming from multiple brokers.

bash

 

rpk topic produce --brokers broker1:9092,broker2:9092 --topic test-topic -n 100000

This will stress-test Redpanda across brokers in AWS and provide insights into scalability and throughput.


You can monitor system performance (CPU, memory, disk I/O) during the tests using AWS CloudWatch or the built-in top and htop tools.

Let me know if you'd like further optimization tips or a specific setup for large-scale tests!


 Test latency and throughput in Redpanda using Redpanda CLI (rpk):


To test latency and throughput in Redpanda, follow these steps. You'll use tools like rpk (Redpanda’s CLI) or existing Kafka benchmarking tools (since Redpanda is Kafka API-compatible). Below are approaches to measure both metrics effectively:

Prerequisites

  1. Install Redpanda:
    • Follow the installation instructions from Redpanda’s official website to set up a Redpanda cluster (single-node or multi-node).
  2. Install Redpanda CLI (rpk):
    • rpk (Redpanda's CLI) is essential for running benchmarks and managing the cluster. Install it as per the official instructions:

            Install RedPanda 

            apt install redpanda

  1. Kafka-compatible Tools:
    • Since Redpanda is compatible with Kafka, tools like Kafka Producer Performance (kafka-producer-perf-test.sh) and Kafka Consumer Performance (kafka-consumer-perf-test.sh) can be used.

1. Testing Throughput

Throughput measures the rate of data transfer in terms of messages per second or megabytes per second.

a) Using rpk to Measure Throughput

  • rpk has built-in benchmarking capabilities to test the producer and consumer throughput.
  • Producer Throughput Test: You can generate test data and measure the throughput of producing messages to a Redpanda topic.

bash

 rpk topic produce --brokers localhost:9092 --key test-key --value test-value -n 10000 --rate 500

Here:

    • --brokers: The address of your Redpanda broker.
    • -n 10000: Number of messages to send.
    • --rate 500: Send messages at a rate of 500 messages per second.

  • Consumer Throughput Test: Consume messages from a topic to measure how fast consumers can process them.

bash

 rpk topic consume test-topic --offset oldest --num 10000

This will consume 10,000 messages and show you the processing speed.

b) Using Kafka Performance Test Scripts

If you want to simulate heavy traffic and measure throughput:

  • Producer Throughput (Kafka):

bash

 

kafka-producer-perf-test.sh \

    --topic test-topic \

    --num-records 100000 \

    --record-size 1024 \

    --throughput -1 \

    --producer-props bootstrap.servers=localhost:9092

Here:

    • --num-records 100000: Sends 100,000 messages.
    • --record-size 1024: Each message is 1024 bytes.
    • --throughput -1: No limit on throughput (send as fast as possible).
    • --producer-props: Kafka producer properties, including the Redpanda broker address.

  • Consumer Throughput (Kafka):

bash

 

kafka-consumer-perf-test.sh \

    --broker-list localhost:9092 \

    --topic test-topic \

    --messages 100000

This will consume 100,000 messages from the topic and provide throughput results.


2. Testing Latency

Latency measures the time taken to deliver a message from producer to consumer.

a) Using rpk to Measure Latency

To test the latency of messages, you can produce and consume messages while observing the latency of message delivery.

  • Producer Latency Test: Measure the time it takes for each message to be produced:

bash

 rpk topic produce --brokers localhost:9092 --key test-key --value test-value -n 10000 --latency

This command will measure the time each message takes to be delivered to the broker.

  • End-to-End Latency Test: You can measure end-to-end latency by producing and consuming messages in real-time. This is done by observing the time when a message is produced and when it's consumed.
    • Produce messages to a topic:

bash

 rpk topic produce test-topic -n 10000 --rate 100 --value "Message with latency test"

    • At the same time, start a consumer:

bash

 rpk topic consume test-topic --offset oldest

  • Compare the timestamps of when messages were produced and when they were consumed.

b) Using Kafka Tools

To perform a detailed latency test using Kafka’s producer performance tool, you can look at how long it takes to acknowledge a sent message.

  • Producer Latency (Kafka):

bash

 

kafka-producer-perf-test.sh \

    --topic test-topic \

    --num-records 10000 \

    --record-size 1024 \

    --throughput 500 \

    --producer-props bootstrap.servers=localhost:9092 \

    --print-metrics

    • --print-metrics: This will print out detailed producer metrics, including message send latency.

3. Benchmarking with Multiple Brokers

If you're using a multi-node Redpanda cluster, you can stress-test the system by producing/consuming from multiple nodes.

  • Modify the --brokers argument to list all the brokers in your Redpanda cluster:

bash

 

rpk topic produce --brokers broker1:9092,broker2:9092 --topic test-topic -n 100000

This helps to measure latency and throughput across multiple brokers in a real-world distributed setup.


4. Monitoring Performance Metrics

  • rpk metrics: Use rpk to observe performance and resource usage metrics in real-time.

bash

 

rpk cluster info

rpk metrics stream

This gives you detailed statistics like message throughput, disk usage, and network metrics.


5. Cloud-Based Testing

If you're testing Redpanda in a cloud environment, consider using monitoring solutions like Prometheus and Grafana to track latency, throughput, and system metrics (CPU, memory, disk I/O) during the test.

Conclusion:

  • Throughput can be measured using rpk or Kafka’s producer/consumer performance scripts by stressing the cluster with a high volume of messages and measuring message rates.
  • Latency can be measured using tools like rpk to observe end-to-end message delivery times or producer acknowledgment times.

Make sure to run tests in a production-like environment to get accurate insights into how Redpanda performs under load.


Monday, 26 August 2024

rpk s3 connecotr

 Here’s a step-by-step guide to downloading the S3 Sink Connector package and its dependencies for Redpanda, specifically designed for environments without internet access.

Step 1: Check Compatibility

Before downloading the S3 Sink Connector, make sure to check the version of Redpanda you are using and the compatible version of the S3 Sink Connector.

Step 2: Download the S3 Sink Connector Package

  1. Go to Confluent Hub: Visit the Confluent Hub to find the S3 Sink Connector.

  2. Search for S3 Sink Connector: In the search bar, type “S3 Sink Connector” and select the relevant result.

  3. Select the Version: Choose the version that matches your Redpanda installation.

  4. Download the Connector: Look for a download option, typically provided as a .zip or .tar.gz file. Click to download the file.

    For example, you might download something like:

    arduino
    confluent-kafka-connect-s3-<version>.zip

Step 3: Extract the Connector Package

Once the package is downloaded, extract the contents:

  1. Using Command Line (Linux/macOS):

    bash
    unzip confluent-kafka-connect-s3-<version>.zip

    or

    bash
    tar -xzvf confluent-kafka-connect-s3-<version>.tar.gz
  2. Using GUI: Right-click on the downloaded file and choose “Extract Here” or a similar option.

Step 4: Identify Required JAR Files

Navigate to the extracted folder and identify the JAR files. You typically need the main connector JAR and any dependencies. For example:

  • kafka-connect-s3-<version>.jar
  • Additional JAR files related to the S3 Sink functionality.

Step 5: Download Dependencies

Since your Redpanda server has no internet access, you will also need to download any dependencies manually. Use a tool like Maven or Gradle on a machine with internet access to gather these dependencies:

  1. Using Maven: Create a simple pom.xml file:

    xml
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.confluent</groupId> <artifactId>kafka-connect-s3</artifactId> <version><version></version> </project>

    Replace <version> with the actual version number you downloaded.

  2. Build the Project: Run the following command to download all dependencies:

    bash
    mvn dependency:copy-dependencies
  3. Locate Dependencies: The dependencies will be in the target/dependency folder.

Step 6: Transfer Files to Redpanda Server

Use scp or another method to transfer the JAR files from your local machine (or the one with internet access) to your Redpanda server.

Step 7: Install the Connector on Redpanda

  1. Create a Directory for Connectors:

    bash
    mkdir -p /path/to/connectors/s3
  2. Move JAR Files: Transfer all downloaded JAR files to the created directory:

    bash
    mv /path/to/your/jars/*.jar /path/to/connectors/s3/

Step 8: Configure and Load the Connector

  1. Create a Configuration File: Create a file named s3-sink-config.json with the following content (adjust the values accordingly):

    json
    { "name": "s3-sink", "connector.class": "io.confluent.connect.s3.S3SinkConnector", "tasks.max": "1", "topics": "your_topic_name", "s3.bucket.name": "your_bucket_name", "s3.region": "your_region", "s3.access.key.id": "your_access_key_id", "s3.secret.access.key": "your_secret_access_key", "format.class": "io.confluent.connect.s3.format.json.JsonFormat", "flush.size": "1000", "storage.class": "io.confluent.connect.s3.storage.S3Storage", "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner", "partitioning.strategy": "time", "time.interval": "3600" }
  2. Load the Connector: Use the rpk command to load the connector configuration:

    bash
    rpk connector load s3-sink /path/to/s3-sink-config.json

Step 9: Verify the Connector

Check the status of the connector:

bash
rpk connector list

You should see the connector listed with its status.

Conclusion

You have now downloaded, transferred, and installed the S3 Sink Connector on your Redpanda server without internet access. If you encounter any issues or need further assistance, feel free to ask!

You said:
above in pdf document
ChatGPT said:
ChatGPT

It seems like I can’t do more advanced data analysis right now. Please try again later. If you need any other assistance or information in the meantime, feel free to ask! ​

You said:
above in word docuemnt
ChatGPT said:
ChatGPT

It seems like I can’t do more advanced data analysis right now. Please try again later. If you need any other assistance or information in the meantime, feel free to ask! ​