I have been writing about best practices in monitoring the cloud infrastructure. To have the context of this series, I would suggest to read my previous posts.
Today, we will talk about another popular way of monitoring cloud infrastructure. We’ll use eG Innovations’ SaaS solution to monitor the same infrastructure.
To begin with this tutorial, sign up at apac.eginnovations.com for a 3 weeks trial monitoring, you do *not* need a credit card to do this. Feed in your name, email password in the registration tab.
You will receive the confirmation in your email. You are ready to begin your evaluation.
Step 1: Discover AWS cloud instances
Before proceeding with this, you need create a access key and a secret key. The steps are given here. These keys are necessary for any tool to interact with AWS cloud system.
Login to your monitoring console at apac.eginnovations.com. Login with your username and password.
You will see the discovery option in the home page of your admin. Press the magnifying glass icon on your left to discover the infrastructure. Select the cloud infrastructure.
You need to download its agent and install it on your environment. Ensure that machine has internet access to connect to AWS infrastructure as well as apac.eginnovations.com either directly or through proxy. Click on Download & Install a remote agent.
All right, to install the agent, there are several ways like downloading agent binary and execute a shell script to install the same. But there is another convenient way. Choose Installation method as Command Line (One-liner). It will provide the commands using curl or wget package. You may choose one and paste in your SSH console of the machine to be monitored.
After the agent installation is completed, your agent name (server name) will appear on the screen. Click the agent name installed by you just now.
I am using AWS cloud. Hence I click on AWS Cloud option for discovery.
I enter my access key and secret key for monitoring. To create the same you may follow the steps given here.
After 5 or 10 minutes of agent waiting period, you may switch to Monitor tab in the top. Click on Dashboards > Cloud > AWS.
You will see the EC2, EBS, RDS and S3 dashboards. I do not have any workload right now. So all values are 0.
So cloud resources monitoring will begin from here. If you have EC2 VMs, you will get basic monitoring (AWS cloud watch metrics) now. Let’s switch to step 2, where we monitor the VMs to perform the full course monitoring to achieve MELT.
Step 2: Converged Monitoring – MELT
Agent Installation
Click the download icon (down arrow) in the top right corner of the window to download the agent to monitor the LightSail VM. I have a Linux VM. So I use the steps to install the agent on the same.
Connect to the Linux VM using Putty Console.
As usual, I copy the curl command from eG agent download screen.
I run it on my SSH putty console.
$ cd / && mkdir -p eGAgent && cd eGAgent && rm -f ./* && curl -o eGAgent_Linux_x64.tar.gz "https://apac.eginnovations.com:443/final/ega?rf=***&gp=***&ak=***" && chmod 750 eGAgent_Linux_x64.tar.gz && gunzip eGAgent_Linux_x64.tar.gz && tar -xvf eGAgent_Linux_x64.tar && chmod 750 ./setup.sh && ./setup.sh
mkdir: cannot create directory ‘eGAgent’: Permission denied
We got a permission error. Because, installation needs root access. I switch to super user first.
$ sudo su -
# cd / && mkdir -p eGAgent && cd eGAgent && rm -f ./* && curl -o eGAgent_Linux_x64.tar.gz "https://apac.eginnovations.com:443/final/ega?rf=***&gp=***&ak=****" && chmod 750 eGAgent_Linux_x64.tar.gz && gunzip eGAgent_Linux_x64.tar.gz && tar -xvf eGAgent_Linux_x64.tar && chmod 750 ./setup.sh && ./setup.sh
You will see the agent installation starts on the VM.
*******************************************************
The eG Agent Version : 7.2.4 has been started …
Please check the file: /opt/egurkha/agent/logs/error_log
for any errors while executing the agent.
*******************************************************
eG Agent installation and start-up completed.
This process will install the agent and start it. Soon, you will see the it is discovered and monitoring is started.
Metrics
To view that you need to switch to the monitor tab in the top. A Linux server is discovered. Hence click on that.
You will see the performance dashboard with key metrics are displayed.
We saw the Metrics are collected already.
Events
Events will be raised as and when it happens. Next step is to configure the Log monitoring and Trace monitoring with the necessary steps. Those are not the scope of this post, hence I skip it.
Every metric has a permissible range of accepted values. Every breach of such metric is considered as an event. In addition, notable activity in the business systems such as a successful login, failed login, error in the process, server reboot, application restart, hang process are all considered as events.
There is an event identified in syslog of the server. An alarm is raised. This alert is emailed to my inbox already.
The alarms window provided more information about the events. There was a continuous error related to unauthorized access to MariaDB.
Logs
One of the language used by the business systems to talk to SRE experts is the log files. They are textual representations of the process with timestamp and detailed information. Logs are helpful to troubleshoot issues and to identify anomalies.
For this demonstration, syslog is being monitored with its timestamp and the log entry. The trend line shows the trend of such occurrence. (Configurations are skipped)
Trace
As business systems have become more complex, user complaints have become more difficult to resolve. To address this issue, SRE practitioners have begun to employ trace analysis, which allows them to track the execution of a request or transaction through a business system. Traces record critical information such as the path a request takes, the functions it calls, the remote connectivity it makes, and the time it spends at each step. This information can be invaluable in troubleshooting user experience problems in a complex IT system.
I have a tomcat app with MariaDB as the backend. User calls are displayed in graphical format, which can be easily understood by ITOps team. (Configurations are skipped)
For the application experts, call graphs are traced and displayed in waterfall model, SQL queries executed by this user call, unexpected errors appeared with line-of-code are collected and presented for in-depth analysis.
As a summary,
A converged observability platform should help reduce the time it takes to set up and configure it. It should also enable you to correlate data across the four pillars of observability — metrics, events, logs, and traces — to provide you with correlated diagnoses. I hope this post has been written in line with that goal.
One challenge here is that no single tool is well understood and appreciated by all stakeholders. For example, a tool that is beloved by application experts may be seen as too technical by an ITOps engineer. Similarly, a tool that is favored by a Windows and Intel engineer may not be useful to application experts. Therefore, a converged observability solution must strike a balance between providing deep technical data and presenting it in a user-friendly way.
—
This post is written as part of #WriteAPageADay campaign of BlogChatter (Day 7 and 8)