Wednesday 29 June 2016

VMware vSphere Syslog options - Part 2

VMware vSphere Syslog options - Part 2


To prepare I deployed a VM running Red Hat Enterprise Edition using the new Developer subscription made free by Red Hat for testing purposes. I can subscribe away lab machines and there's no $99 yearly charge anymore. I highly recommend getting hold of this to brush up on Linux skills.

Note: You may hit a console bug where the Redhat lock screen freezes, turn off screensavers and automatic lock screen behaviour in Redhat before you go to use Putty to prevent this. It may be down to particular VMware patches but I've found nothing conclusive yet, it's just annoying when you're in the middle of a Lab and worth avoiding - obviously Production will need a different approach.

This is the main reference I'm following to get this up and running as it contains exactly the outcome I was after:
http://docs.fluentd.org/articles/free-alternative-to-splunk-by-fluentd

You'll need a few Firewall ports open so start with these in the GUI for runtime and permanent:
TCP Port 5601 for Kibana web access
UDP Port 514 for syslog-ng to redirect syslog to fluentd

Note: ensure you've subscribed RHEL to the Update Repository at this point.

Next open a Putty session to the new VM. Check the Java version is ok:

[dufus@syslog5 ~]$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

Version 7 or higher is needed, so I'm ok with version 8 as shown above.

Prerequisites:
Setup for NTP

Check ulimit value:
ulimit -n
(mine returned 1024 which needs to be changed)

Edit /etc/security/limits.conf
There are some changes required the file permissions to carry this out:
Check permissions on limits.conf file:
stat --format '%a' /etc/security/limits.conf
This gives me "644"
Set permissions to allow editing of limits.conf file:
sudo chmod 777 /etc/security/limits.conf
vi /etc/security/limits.conf

Add these lines to this file (Copy & Paste the whole lot directly into VI if connected with Putty client):
root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536

Set permissions back:
sudo chmod 644 /etc/security/limits.conf

I'm not running a high load environment so that's it for me. Time to reboot to apply before going further.

Note: This is a good time to shut down the VM and take a snapshot until you get the remaining changes successfully implemented.

Now, the article might refer to older versions of Elasticsearch and Kibana, I'm going to use the latest versions currently available so adjust the download links after checking the web page sources listed below:

Elasticsearch

Check the url for the latest version here:
https://www.elastic.co/downloads/elasticsearch

You can install elasticsearch via Yum but it doesn't prepare the right version, you need to match Kibana and Elasticsearch versions.

I found installing Elasticsearch as an RPM was the best option and allowed me to run it as a service much easier:

cd /tmp
curl -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.3/elasticsearch-2.3.3.rpm
sudo rpm -Uvh elasticsearch-2.3.3.rpm
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service

Done! Now Elasticsearch is running as a service.

Kibana

Check the url for the latest version here:
https://www.elastic.co/downloads/kibana

Now open another putty session and run the following commands, adjusting the filename according to the new version you see available:

curl -O https://download.elastic.co/kibana/kibana/kibana-4.5.1-1.x86_64.rpm
sudo rpm -Uvh kibana-4.5.1-1.x86_64.rpm
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable kibana.service
sudo systemctl start kibana.service

Now the Kibana server is running you can check it on the VM console (if it hasn't locked up!) on http://0.0.0.0:5601


There no data so that's it for now.

Fluentd (td-agent)

Now open a third Putty session and we'll try to install Fluentd and get the configuration file edited and working.

http://docs.fluentd.org/articles/install-by-rpm

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

The script above executes and installs the td-agent2

Now install the Elasticsearch plugin:

sudo /usr/sbin/td-agent-gem install fluent-plugin-elasticsearch

And next modify the td-agent.conf file (back it up first!):

sudo cp /etc/td-agent/td-agent.conf /etc/td-agent/td-agent.conf.bak
stat --format '%a' /etc/td-agent/td-agent.conf
644
sudo chmod 777 /etc/td-agent/td-agent.conf
vi /etc/td-agent/td-agent.conf

Modify as follows:
Find the section:
<source>
  type forward
</source>
Now insert a few blank lines above this and copy and paste the section below, then you can delete the section above and you're good to go! 

<source>
  @type syslog
  port 42185
  tag syslog
  format /^(?<pid>[^ ]*) (?<time>[^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*) (?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
  time_format %Y-%m-%dT%H:%M:%S%z
</source>

<source>
  @type forward
</source>

<match syslog.**>
  @type elasticsearch
  logstash_format true
  flush_interval 10s # for testing
</match>

sudo chmod 644 /etc/td-agent/td-agent.conf
sudo /etc/init.d/td-agent start

Syslog-NG

This section deals with installing Syslog-NG. The inbuilt rsyslog engine I found doesn't preserve the source host address easily so I thought I'd use the more powerful syslog-ng tool instead for this purpose. I hit a problem with a dependency when trying to install syslog-ng but found a suitable binary using www.rpmfind.net so you may need to watch out for this if versions change etc.

Error: Package: syslog-ng-3.5.6-3.el7.x86_64 (epel)
           Requires: libnet.so.1()(64bit)

My solution to getting syslog-ng installed on RHEL 7.2:

cd /tmp
curl -O https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -Uvh epel-release-latest-7.noarch.rpm
yum repolist
curl -O ftp://195.220.108.108/linux/fedora/linux/releases/23/Everything/x86_64/os/Packages/l/libnet-1.1.6-10.fc23.x86_64.rpm
sudo rpm -Uvh libnet-1.1.6-10.fc23.x86_64.rpm
yum -y install syslog-ng
vi /etc/syslog-ng/syslog-ng.conf

Replace the existing contents with the ones below - replace the IP Address with the local IP of the system you are installing to and where Fluentd resides.

@version: 3.7
@include "scl.conf"

    options {
        time-reap(30);
        mark-freq(10);
        keep-hostname(yes);
        chain-hostnames(no);
        };
    source s_network {
        syslog(transport(udp));
        };
    destination d_syslog_udp {
        syslog("192.168.10.104" transport("udp") port(42185));
    };

    log { source(s_network);
          destination(d_syslog_udp);
        };

systemctl stop rsyslog.service
systemctl start syslog-ng.service
systemctl disable rsyslog.service
systemctl enable syslog-ng.service
yum -y remove rsyslog

Now check Kibana and you should see message appear.


The host, ident, message and @timestamp fields should all line up. I found I had to use some online parsers to get the format line in the syslog-ng.conf just right or I would be missing data, or worse get errors in fluentd as follows:

tail -f /var/log/td-agent/td-agent.log

2016-06-02 14:12:32 +0100 [error]: "<167>1 2016-06-02T13:12:30+00:00 bdragon.lab.local Vpxa - - - verbose vpxa[FF88DAC0] [Originator@6876 sub=PropertyProvider opID=HB-host-9@1193-44773711-1e] RecordOp ASSIGN: info.state, session[5270aadd-231b-3521-e317-dbd316cd75e4]52c63ee9-fe3d-d55a-6128-51ec302ec66d. Applied change to temp map." error="invalid time format: value =  2016-06-02T13:12:30+00:00, error_class = ArgumentError, error = invalid strptime format - `%b %d %H:%M:%S'"

The frustrating thing is that the issue above was caused by the "<167>1" and not the format of the date! Even after long hours figuring out the correct format I was still left with an error. The websites below held the key, I'm no Ruby expert but it let me quickly try out different expressions until I arrived at the ones above finally. I've left the full link deliberately to reflect the final expression I tested that worked.

https://fluentular.herokuapp.com/parse?regexp=%5E%28%3F<pid>%5B%5E+%5D*%29+%28%3F<time>%5B%5E+%5D*%29+%28%3F<host>%5B%5E+%5D*%29+%28%3F<ident>%5Ba-zA-Z0-9_%5C%2F%5C.%5C-%5D*%29+%28%3F%3A%5C%5B%28%3F<pid>%5B0-9%5D%2B%29%5C%5D%29%3F%28%3F%3A%5B%5E%5C%3A%5D*%5C%3A%29%3F+*%28%3F<message>.*%29%24&input=<167>1+2016-06-02T12%3A40%3A38%2B00%3A00+bdragon.lab.local+Vpxa+-+-+-+verbose+vpxa%5BFFCA7B70%5D+%5BOriginator%406876+sub%3DVpxaHalCnxHostagent+opID%3DWFU-564af32b%5D+%5BVpxaHalCnxHostagent%3A%3AProcessUpdate%5D+Applying+updates+from+6463+to+6464+%28at+6463%29&time_format=%25Y-%25m-%25dT%25H%3A%25M%3A%25S%25z

http://rubular.com

So, you fire all syslogs out of your ESXi hosts over UDP port 514 to the central RHEL server and it listens for them, syslog-ng forwards them to UDP Port 42185 where Fluentd listens and then onto Elasticsearch via logstash. Easy!! At least you can view the results in a cool web browser and do all the searching you need.

I've not looked at long term storage requirements here so you'll need to monitor disk space and ensure Elasticsearch doesn't run out, best to test this with one or two Hosts in a Lab before sizing and deciding on log retention etc.

It's not a particularly easy combination, it may be possible to use rsyslogd instead of syslog-ng but this is the combination I found to work and enable more powerful searching and indexing of syslog files from VMware. That or just use their Log analyzer product if you prefer!


Tuesday 14 June 2016

VMware vSphere Syslog options - Part 1

VMware vSphere Syslog options - Part 1


The troubleshooting facilities in VMware vCenter and vSphere are fairly good in my opinion. Through my work I can access a support dump analyzer for ESXi Hosts that provides very useful information. I can crack open the vCenter dump to access specific logs and see what's been going on. I can tail the vmkernel.logs to monitor realtime activity and with the introduction of the Syslog collector service in vCenter look back in time - just add the logs or vCenter VM to the backup schedule and see if a particular problem existed previous to a patch or upgrade.

Where I don't get involved is in deep investigation of issues. Support typically go deeper in their analysis of particular problems and the in built syslog server uses flat logfiles which make tracing a particular fault more difficult.

I heard about a few options in this area and wanted to explore them here and try them out.

Pay versions:

VMware vRealize Log Insight
https://www.vmware.com/products/vrealize-log-insight/
Commercial, free for up to 25 OSI (Operating System Instances) when you own a supported vCenter license but $$ beyond that. This might just do some smaller businesses and from what I've seen is a great product with contents packs to extend monitoring beyond VMware Products. They don't charge for the storage of large amounts of log data. The product is deployed as a virtual appliance.

Splunk
http://www.splunk.com/en_us/products/splunk-enterprise.html
This is the main competitor to VMware's product and has been out there a while with a good knowledge base. They have cloud options and a Splunk Lite. The pricing is per GB of logs, hence you need to know the volume of logs generated but this could spike when experiencing an issue which is not so good! It installs on Windows, Linux, Solaris and Mac OS.

Kiwi Syslog Server
http://www.kiwisyslog.com/products/kiwi-syslog-server/product-overview.aspx
I've used this myself in the past with their CATTools to backup Cisco switches. Cost is a flat fee of €240 and installs on windows only.

Free/Pay versions:

PRTG Free Syslog Server
https://www.paessler.com/free_syslog_server
Well, it's free up to 100 sensors, they estimate  each device will use 5-10 sensors but I'm going to test it with just syslog on an ESXi host to see if I could get 100 ESXi hosts out of it. You want more of course you pay more and it's not free at that point. But it could be just enough for you.....

Free versions:

Syslog-NG
https://www.balabit.com/network-security/syslog-ng
https://syslog-ng.org
https://github.com/balabit/syslog-ng
Well, this is completely free and open source but requires Linux. You can use the VMware virtual management appliance (vMA) to get this up and running. I'm interested to see how much work there is to get it capturing logs from multiple ESXi hosts and then query the product as with most Open Source solutions, they are no easy to use / get working out of the box. But if you're up for a challenge then so am I!!

My ideal is an open source, free product, easy to set up, Kibana web front end with Elastic Search and little to no configuring for vSphere logs!! I can dream right?!

PRTG Free Syslog Server

So this one is interesting. Once installed and loaded up it began discovering my network and picked up everything running on the same subnet. This included vCenter, ESXi, network devices etc. All I was interested in was the syslog server but it's not enabled by default. You have to add a new probe device to the local probe called "Syslog Receiver" and choose the settings if you want them different from the default, mine worked fine as is. Once I configured ESXi from KB2003322 I got a few messages through and could see how it worked.

So it's getting message but how easy is it to retrieve specific message levels. There is a message tab to list all recent ones.
Then you can select a specific severity as shown below. There is some rudimentary text searching.
The historic data tab allows exporting of specific errors to html or csv.

That's as exciting as it gets! 
The trial license stays for 30 days even after putting in the free key, then limiting you to 100 devices. I was able to remove a lot of the discovered entries and sensors to reduce the amount in use. Windows Firewall Rules were added automatically. There was a web interface but I didn't spend too much time with it. 

Kiwi Syslog Server

This one is also Windows based and required .NET 3.5 so add that Feature into Server 2012 R2 if required to allow the install to proceed. After install I pointed ESXi at the new syslog IP and off it went. You can see immediately the new logs hitting the server. 
There is no search or filter really in the main screen. You can schedule archive options as shown to deal with historical records but it just stores things in a flat file much like the VMware equivalent. You can do more filtering of course over what it captures and output it to different "screens" but it's no better than when I last looked at it over 10 years ago....there web interface failed on install so didn't get a look at that. 

Splunk Enterprise

Simple to install. I went with the Enterprise version to see what features it provided. There is a VMware App you can add in but after 10 minutes while this was heading in the direction I wanted I gave up as it appears complex but very powerful, not so easy to get to grips with. Very well worth checking out though....

VMware vRealize Log Insight
Erm, I downloaded the appliance and found that you need a supported version of vCenter to get a free 25 OSI license, but they then say go to the downloads page and click on Read More to get the 5.5 and lower version key which is listed right there?! The OVA deployment was standard, I let DHCP do everything as I was only going to test it for a few minutes. 
Blog entry here from VMware:
Now, how does it stack up? Well I couldn't connect to it at first with the browser, after a few minutes it rebooted and was doing some configuration by itself so I left it alone. Wait until you see this in the console.
Now you'll be able to connect with the web interface


I clicked New Deployment. Once you finish all the initial setup questions you can configure integration. As it's a VMware product for monitoring VMware primarily I know this is going to be easy! 
To start with I just configured my ESXi Host to sent logs to the Log Insight appliance and didn't configure vCenter integration. Just to see what happened. There are content packs for vSAN and more.
I was definitely getting logs in from the host so I decided to configure vCenter integration next.

The VMware specific dashboards are interesting.
And you can drill into these to get the interactive view which is absent in the other products where I tested it. 
So, I'd be fairly happy to give HPE Support access to VMware's Log Insight and be sure it would help them out. PRTG would also tick my box but I'm less sure about Kiwi Syslog Server. Splunk with effort in setting it up would most likely beat all of these but cost is a factor and the slight complexity which when you've limited time for a Lab session isn't workable. I'm sure there's good Blogs out there you can find that would help you set it up correctly and to test it out. 

Syslog-NG

This one troubles me as there doesn't appear to be any web interface or search engines so what's the point!?! I did find some good articles using other open source software so I'm going to give them a try, document the results and add them as my next post so stay tuned. If it works I just need to find a suitable Linux distribution to deploy and see how manageable that becomes over time. 

Disclaimer: I work for HPE as a Consultant.