Thursday, 7 December 2023

logs to sns

 

Here's a comprehensive guide to setting up a monitoring system for your Redpanda server using Amazon CloudWatch, including creating a metric filter to monitor for service down events, configuring an alarm that triggers based on this metric, and sending notifications to an SNS topic if the alarm fires.

Step 1: Create an SNS Topic

First, you need to create an SNS topic to receive notifications.

  1. Open the Amazon SNS Console:
  2. Create a Topic:
    • Click on Topics in the left navigation pane.
    • Click the Create topic button.
    • Select Standard or FIFO (Standard is usually sufficient).
    • Fill in the required details:
      • Name: Enter a name for your topic (e.g., RedpandaAlerts).
    • Click Create topic.
  3. Subscribe to the Topic:
    • Click on your newly created topic to view its details.
    • Click Create subscription.
    • Select a protocol (e.g., Email, SMS, Lambda, etc.) and enter the necessary endpoint (e.g., email address).
    • Click Create subscription.
    • If you chose Email, check your inbox and confirm the subscription.

Step 2: Configure CloudWatch Agent to Send Logs

Ensure that the CloudWatch Agent is configured to send logs from /var/log/messages to CloudWatch Logs.

  1. Modify the CloudWatch Agent configuration file (e.g., /opt/aws/amazon-cloudwatch-agent/bin/config.json) to include the following:

json

 

{

  "logs": {

    "logs_collected": {

      "files": {

        "collect_list": [

          {

            "file_path": "/var/log/messages",

            "log_group_name": "RedpandaLogs",

            "log_stream_name": "{instance_id}",

            "retention_in_days": 14

          }

        ]

      }

    }

  }

}

  1. Restart the CloudWatch Agent to apply changes:

bash

 

sudo systemctl restart amazon-cloudwatch-agent

Step 3: Create a Metric Filter

Set up a metric filter to count the occurrences of specific error messages indicating that the Redpanda server is down.

  1. Open the CloudWatch Console:
  2. Create Metric Filter:
    • Select Logs and find your log group (e.g., RedpandaLogs).
    • Click on Create Metric Filter.
    • Define the filter pattern. For example, you might use:

 

"Redpanda service failed to start" OR "Unable to connect to Redpanda" OR "timeout error"

    • Click Next.
  1. Assign Metric Details:
    • Give your metric a name (e.g., RedpandaServiceDown) and assign it to a namespace (e.g., Redpanda/Metrics).
    • Set the metric value to 1.
    • Click Create Filter.

Step 4: Create a CloudWatch Alarm

Now, create an alarm based on the metric filter to notify you if the Redpanda service is down.

  1. Open the CloudWatch Console:
    • Navigate to the Alarms section.
  2. Create Alarm:
    • Click on Create Alarm.
    • Select the metric you just created (RedpandaServiceDown).
    • Click Select metric.
  3. Define Alarm Conditions:
    • Set the condition to trigger the alarm when the metric is greater than 0 for a period of 5 minutes.
    • Click Next.
  4. Configure Actions:
    • In the Notification section, select the SNS topic you created earlier (RedpandaAlerts).
    • Optionally, configure actions for the OK state to receive notifications when the service is back online.
  5. Name and Create the Alarm:
    • Name your alarm (e.g., "Redpanda Service Down Alarm").
    • Review your settings and click Create Alarm.

Step 5: Test Your Setup

To ensure everything is working correctly, you can simulate an error:

  1. Simulate an Error:
    • Stop the Redpanda service or create log entries that match your error patterns.
  2. Check Alarm Status:
    • Navigate back to the Alarms section in CloudWatch and see if your alarm has entered the ALARM state.
  3. Verify SNS Notifications:
    • Check if the subscribers receive the notifications sent to the SNS topic.

Summary of Steps

  • Create an SNS Topic: Set up a topic for notifications.
  • Configure CloudWatch Agent: Ensure logs are sent to CloudWatch Logs.
  • Create a Metric Filter: Count occurrences of specific error messages.
  • Create a CloudWatch Alarm: Trigger an alarm when the service is down and notify via SNS.
  • Test the Setup: Simulate a failure and verify notifications.

By following these steps, you'll establish a robust monitoring system for your Redpanda server that promptly alerts your team in case of service downtime.

 

No comments:

Post a Comment