Wednesday 29 June 2016

VMware vSphere Syslog options - Part 2

VMware vSphere Syslog options - Part 2


To prepare I deployed a VM running Red Hat Enterprise Edition using the new Developer subscription made free by Red Hat for testing purposes. I can subscribe away lab machines and there's no $99 yearly charge anymore. I highly recommend getting hold of this to brush up on Linux skills.

Note: You may hit a console bug where the Redhat lock screen freezes, turn off screensavers and automatic lock screen behaviour in Redhat before you go to use Putty to prevent this. It may be down to particular VMware patches but I've found nothing conclusive yet, it's just annoying when you're in the middle of a Lab and worth avoiding - obviously Production will need a different approach.

This is the main reference I'm following to get this up and running as it contains exactly the outcome I was after:
http://docs.fluentd.org/articles/free-alternative-to-splunk-by-fluentd

You'll need a few Firewall ports open so start with these in the GUI for runtime and permanent:
TCP Port 5601 for Kibana web access
UDP Port 514 for syslog-ng to redirect syslog to fluentd

Note: ensure you've subscribed RHEL to the Update Repository at this point.

Next open a Putty session to the new VM. Check the Java version is ok:

[dufus@syslog5 ~]$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

Version 7 or higher is needed, so I'm ok with version 8 as shown above.

Prerequisites:
Setup for NTP

Check ulimit value:
ulimit -n
(mine returned 1024 which needs to be changed)

Edit /etc/security/limits.conf
There are some changes required the file permissions to carry this out:
Check permissions on limits.conf file:
stat --format '%a' /etc/security/limits.conf
This gives me "644"
Set permissions to allow editing of limits.conf file:
sudo chmod 777 /etc/security/limits.conf
vi /etc/security/limits.conf

Add these lines to this file (Copy & Paste the whole lot directly into VI if connected with Putty client):
root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536

Set permissions back:
sudo chmod 644 /etc/security/limits.conf

I'm not running a high load environment so that's it for me. Time to reboot to apply before going further.

Note: This is a good time to shut down the VM and take a snapshot until you get the remaining changes successfully implemented.

Now, the article might refer to older versions of Elasticsearch and Kibana, I'm going to use the latest versions currently available so adjust the download links after checking the web page sources listed below:

Elasticsearch

Check the url for the latest version here:
https://www.elastic.co/downloads/elasticsearch

You can install elasticsearch via Yum but it doesn't prepare the right version, you need to match Kibana and Elasticsearch versions.

I found installing Elasticsearch as an RPM was the best option and allowed me to run it as a service much easier:

cd /tmp
curl -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.3/elasticsearch-2.3.3.rpm
sudo rpm -Uvh elasticsearch-2.3.3.rpm
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service

Done! Now Elasticsearch is running as a service.

Kibana

Check the url for the latest version here:
https://www.elastic.co/downloads/kibana

Now open another putty session and run the following commands, adjusting the filename according to the new version you see available:

curl -O https://download.elastic.co/kibana/kibana/kibana-4.5.1-1.x86_64.rpm
sudo rpm -Uvh kibana-4.5.1-1.x86_64.rpm
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable kibana.service
sudo systemctl start kibana.service

Now the Kibana server is running you can check it on the VM console (if it hasn't locked up!) on http://0.0.0.0:5601


There no data so that's it for now.

Fluentd (td-agent)

Now open a third Putty session and we'll try to install Fluentd and get the configuration file edited and working.

http://docs.fluentd.org/articles/install-by-rpm

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

The script above executes and installs the td-agent2

Now install the Elasticsearch plugin:

sudo /usr/sbin/td-agent-gem install fluent-plugin-elasticsearch

And next modify the td-agent.conf file (back it up first!):

sudo cp /etc/td-agent/td-agent.conf /etc/td-agent/td-agent.conf.bak
stat --format '%a' /etc/td-agent/td-agent.conf
644
sudo chmod 777 /etc/td-agent/td-agent.conf
vi /etc/td-agent/td-agent.conf

Modify as follows:
Find the section:
<source>
  type forward
</source>
Now insert a few blank lines above this and copy and paste the section below, then you can delete the section above and you're good to go! 

<source>
  @type syslog
  port 42185
  tag syslog
  format /^(?<pid>[^ ]*) (?<time>[^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*) (?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
  time_format %Y-%m-%dT%H:%M:%S%z
</source>

<source>
  @type forward
</source>

<match syslog.**>
  @type elasticsearch
  logstash_format true
  flush_interval 10s # for testing
</match>

sudo chmod 644 /etc/td-agent/td-agent.conf
sudo /etc/init.d/td-agent start

Syslog-NG

This section deals with installing Syslog-NG. The inbuilt rsyslog engine I found doesn't preserve the source host address easily so I thought I'd use the more powerful syslog-ng tool instead for this purpose. I hit a problem with a dependency when trying to install syslog-ng but found a suitable binary using www.rpmfind.net so you may need to watch out for this if versions change etc.

Error: Package: syslog-ng-3.5.6-3.el7.x86_64 (epel)
           Requires: libnet.so.1()(64bit)

My solution to getting syslog-ng installed on RHEL 7.2:

cd /tmp
curl -O https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -Uvh epel-release-latest-7.noarch.rpm
yum repolist
curl -O ftp://195.220.108.108/linux/fedora/linux/releases/23/Everything/x86_64/os/Packages/l/libnet-1.1.6-10.fc23.x86_64.rpm
sudo rpm -Uvh libnet-1.1.6-10.fc23.x86_64.rpm
yum -y install syslog-ng
vi /etc/syslog-ng/syslog-ng.conf

Replace the existing contents with the ones below - replace the IP Address with the local IP of the system you are installing to and where Fluentd resides.

@version: 3.7
@include "scl.conf"

    options {
        time-reap(30);
        mark-freq(10);
        keep-hostname(yes);
        chain-hostnames(no);
        };
    source s_network {
        syslog(transport(udp));
        };
    destination d_syslog_udp {
        syslog("192.168.10.104" transport("udp") port(42185));
    };

    log { source(s_network);
          destination(d_syslog_udp);
        };

systemctl stop rsyslog.service
systemctl start syslog-ng.service
systemctl disable rsyslog.service
systemctl enable syslog-ng.service
yum -y remove rsyslog

Now check Kibana and you should see message appear.


The host, ident, message and @timestamp fields should all line up. I found I had to use some online parsers to get the format line in the syslog-ng.conf just right or I would be missing data, or worse get errors in fluentd as follows:

tail -f /var/log/td-agent/td-agent.log

2016-06-02 14:12:32 +0100 [error]: "<167>1 2016-06-02T13:12:30+00:00 bdragon.lab.local Vpxa - - - verbose vpxa[FF88DAC0] [Originator@6876 sub=PropertyProvider opID=HB-host-9@1193-44773711-1e] RecordOp ASSIGN: info.state, session[5270aadd-231b-3521-e317-dbd316cd75e4]52c63ee9-fe3d-d55a-6128-51ec302ec66d. Applied change to temp map." error="invalid time format: value =  2016-06-02T13:12:30+00:00, error_class = ArgumentError, error = invalid strptime format - `%b %d %H:%M:%S'"

The frustrating thing is that the issue above was caused by the "<167>1" and not the format of the date! Even after long hours figuring out the correct format I was still left with an error. The websites below held the key, I'm no Ruby expert but it let me quickly try out different expressions until I arrived at the ones above finally. I've left the full link deliberately to reflect the final expression I tested that worked.

https://fluentular.herokuapp.com/parse?regexp=%5E%28%3F<pid>%5B%5E+%5D*%29+%28%3F<time>%5B%5E+%5D*%29+%28%3F<host>%5B%5E+%5D*%29+%28%3F<ident>%5Ba-zA-Z0-9_%5C%2F%5C.%5C-%5D*%29+%28%3F%3A%5C%5B%28%3F<pid>%5B0-9%5D%2B%29%5C%5D%29%3F%28%3F%3A%5B%5E%5C%3A%5D*%5C%3A%29%3F+*%28%3F<message>.*%29%24&input=<167>1+2016-06-02T12%3A40%3A38%2B00%3A00+bdragon.lab.local+Vpxa+-+-+-+verbose+vpxa%5BFFCA7B70%5D+%5BOriginator%406876+sub%3DVpxaHalCnxHostagent+opID%3DWFU-564af32b%5D+%5BVpxaHalCnxHostagent%3A%3AProcessUpdate%5D+Applying+updates+from+6463+to+6464+%28at+6463%29&time_format=%25Y-%25m-%25dT%25H%3A%25M%3A%25S%25z

http://rubular.com

So, you fire all syslogs out of your ESXi hosts over UDP port 514 to the central RHEL server and it listens for them, syslog-ng forwards them to UDP Port 42185 where Fluentd listens and then onto Elasticsearch via logstash. Easy!! At least you can view the results in a cool web browser and do all the searching you need.

I've not looked at long term storage requirements here so you'll need to monitor disk space and ensure Elasticsearch doesn't run out, best to test this with one or two Hosts in a Lab before sizing and deciding on log retention etc.

It's not a particularly easy combination, it may be possible to use rsyslogd instead of syslog-ng but this is the combination I found to work and enable more powerful searching and indexing of syslog files from VMware. That or just use their Log analyzer product if you prefer!