Thursday 30 January 2020

Auditing Complex vSphere Network Environments

Auditing Complex vSphere Network Environments


Sometimes customers specify a particular network configuration they want to use in their environment. Despite warnings that complexity comes at a cost they proceed. I'm usually the one who bears the brunt in two ways. One, I have to build the environment and match up hundreds of vmnics with the correct uplinks and Two, If I get any wrong I've to troubleshoot and fix it!

The simplest way to set up HPE Blades is to NOT use loads of FlexNics to control bandwidth etc. We do use CNAs or Converged Network Adapters to carry Ethernet and Storage traffic and this is ok as we'll split maybe 8Gb to Storage and 12Gb to Ethernet but that's it. We present two HBAs and two "20Gb" Nics to the Blade. We connect the Ethernet NICs to a single DVS and use Netioc to prioritize management, vm and vmotion traffic accordingly. It's simple and works every time.

Now, take a customer who prefers to carve up the 20Gb NIC into multiple Ethernet sub nics called FlexNics in HPE's world. We've taken 8Gb for storage firstly, and then they want say 2Gb for Backup, 2.5GB for vMotion and the rest for Management. You are now presenting SIX nics per host instead of two. You will still use a Single DVS but it has to have six uplinks. You should rename those uplinks so that it clearly states what each is to be used for. Now when you join a host to the DVS you MUST assign each pair of vmnics to the CORRECT uplink. Yeah, it's fun! 

Problem: In about 1 blade in 20 in my experience, the PCI order is reversed and you will experience wierd symptoms such as:

  • Can ping blade but can't open SSH to it despite no lockdown mode and ssh service is running
  • Can't vmotion off or to this host despite vlan and ip being correct
That was some of the wierdness when I rang VMware about an issue and we found a mismatch between the vmnics and uplinks. So, I spent a while trying to see how I would audit this. Is there a script I could use to extract information from HPE OneView and vSphere that would let me see if I had an issue with a host in this way? It's not good enough to just dump the MAC addresses from each side, we need the vmnic number and the uplink name also to do a proper comparison. 

So, I'm assuming here you have an environment that is not connected to the internet. It's easier if you do. Otherwise there is a bit of work to get powershell prepared so that it can execute vmware and HPE OneView commands correctly. You can just download the PowerCLI installer if that's easier, although it is getting out of date. Otherwise google how to install the VMware modules on an offline computer. You can transfer the install from one computer to another that way. 

HPE OneView can be installed locally and the modules transferred to the desired computer. You may decide to deploy a dedicated VM for this alongside the customer's environment. I had trouble installing .Net 4.7.2 on an existing host and didn't want to mess with it for too long. A fresh system might work best. More info on OneView POSH here: 

Now do this in powershell on an internet connected computer:
Install-Module HPOneView.500 or Save-Module HPOneView.500
Grab the folder "C:\Program Files\WindowsPowerShell\Modules\HPOneView.500" and copy this to the target along with the .Net 4.7.2 offline installer. Once there install .Net 4.7.2 and import the module into powershell (reboot as required):
Import-Module HPOneView.500
Verify the various versions and dependencies which will change and are captured in the github article above. 
Once you load the module you run these two commands to get the info we need from the HPE side:

connect-hpovmgmt -authlogindomain local -hostname 192.168.0.10 -username administrator -password hahahaha

get-hpovserverprofileconnectionlist | out-file -filepath .\connections.txt

(use the windows onscreen keyboard if connecting via a vmware console to get | and \ characters)
You may need to import any root / intermediate certificates to connect if you have replaced the self signed one from HPE. 

Now use powershell or powercli to run two scripts against the desired vCenter:

getmac.ps1:

get-vmhostnetworkadapter | select vmhost, name, IP, SubnetMask, Mac, PortGroupName, vMotionEnabled, mtu, FullDuplex, BitRatePerSec | Export-csv c:\temp\vmhostnetworkdetails.txt

report.ps1:

$report = @()

foreach($sw in (Get-VirtualSwitch -Distributed)){

    $uuid = $sw.ExtensionData.Summary.Uuid

    $sw.ExtensionData.Config.Host | %{

        $portStates = $sw.ExtensionData.FetchDVPorts($null)

        $esx = Get-View $_.Config.Host

        $netSys = Get-View $esx.ConfigManager.NetworkSystem

        $netSys.NetworkConfig.ProxySwitch | where {$_.Uuid -eq $uuid} | %{

            foreach($pnicSpec in $_.Spec.Backing.PnicSpec){

                $row = "" | Select Host,dvSwitch,PNic,PortLinkUp,DvUplink,confvlans,SpeedLink,DuplexLink

                $row.Host = $esx.Name

                $row.dvSwitch = $sw.Name

                $row.PNic = $pnicSPec.PnicDevice

                $row.PortLinkUp = ($portStates | where{$_.Key -eq $pnicSPec.UplinkPortKey}).State.RunTimeInfo.LinkUp

                $row.DvUplink =  ($portStates | where{$_.Key -eq $pnicSPec.UplinkPortKey}).config.Name

                $row.confvlans = (($portStates | where{$_.Key -eq $pnicSPec.UplinkPortKey}).State.RuntimeInfo.vlanids | ft -HideTableHeaders |out-string).TrimStart().TrimEnd()

                $row.SpeedLink = ($netSys.NetworkConfig.pnic | where {$_.Device -eq $pnicSPec.PnicDevice }).Spec.Linkspeed.SpeedMb

                $row.DuplexLink = ($netSys.NetworkConfig.pnic | where {$_.Device -eq $pnicSPec.PnicDevice }).Spec.Linkspeed.Duplex

                $report += $row

            }

        }

    }

}

$report | Export-Csv -Path .\report.csv -NoTypeInformation -UseCulture

(Credit & Thanks again to LucD for the script above)

Now take the 3 files and open Excel!

Import the connections.txt and vmhostnetworkdetails.txt and finally report.csv files into a single excel file. Move them around until they are in adjacent workbooks. You will need to use the "Text to Columns" option under the Data ribbon to convert and then format the data correctly, using Fixed Width for connections.txt data and comma separation for vmhostnetworkdetails. The report data is fine. Choose Format, AutoFit Column Width under the Home ribbon. 
Remove any spurious top lines from the worksheets so you're just left with the headings and the data. 

Next, on the Report worksheet, remove the rows which we're not interested in. I have additional Mezzanine ethernet adapters connected to Lag groups which I'm not interested in as these have no permutations to validate, they are straight connections between the vmnic and DVS uplinks groupings. Now we're left with 6 lines per host. 

We need to move some stuff around as vlookups work best when the "KEY" we're going to generate is referenced. Go to the connections worksheet and move or copy the mac column so it's the FIRST column in that worksheet. 

Now we create a key in one of the sheets. Go to the vmhostnetworkdetails worksheet and create a new Column A in front of everything else. For the first host line on row 2 enter the formula "=B2&C2" so that it concatenates the values of the VM Host and Name into a single value we'll use to reference against. Copy and Paste that formula right down Column A where ever there's data in the corresponding Rows. 
Now go to the Report worksheet and delete the D/F/G/H columns which we don't need. In an empty column to the right of the data we'll repeat the previous step. So in my case in J2 I've the formula "=A2&C2". Copy and Paste this as before all down Column J where there's data to the left. 

Now we start pulling data in from the other two worksheets. 
In the Report Worksheet enter the following formula into F2 and copy down the spreadsheet in Column F:
"=VLOOKUP(J2,vmhostnetworkdetails!A:F,6,FALSE)"
That should populate all the MACs that Vmware reported on
In the Report Worksheet enter the following formula into G2 and copy down the spreadsheet in Column G as before:
"=VLOOKUP(F2,'connections'!A:G,7,FALSE)"
Now we need a way to see if we've aligned the OneView Connection with the right DVS Uplink. Here's how we do that. 

There are three pairs of repeating uplinks per host left in the report worksheet. My uplink naming is as follows:
  • Management-1
  • Management-2
  • Backup-1
  • Backup-2
  • vMotion-1
  • vMotion-2
I created a formula in Column K to say Correct or Wrong depending if I've a mismatch or not. There is a set of 6 repeating formulas as follows:

K2: "=IF((AND(D2="Management-1", G2="A-Net_Set")), "Correct", "Wrong")"
K3: "=IF((AND(D3="Management-2", G3="B-Net_Set")), "Correct", "Wrong")"
K4: "=IF((AND(D4="Backup-1", G4="A-Backup")), "Correct", "Wrong")"
K5: "=IF((AND(D5="Backup-2", G5="B-Backup")), "Correct", "Wrong")"
K6: "=IF((AND(D6="vMotion-1", G6="A-Vmware_vMotion")), "Correct", "Wrong")"
K7: "=IF((AND(D7="vMotion-2", G7="B-Vmware_vMotion")), "Correct", "Wrong")"

Once you get these 6 entered, you can copy the set of 6 and paste a few times. Then copy several sets of six and you'll be able to quickly paste right the way down column K until all hosts have this field populated. 
Next Select Column K and do conditional formatting to highlight any field where the text ="Wrong" in RED and scroll down the report to see if you have a problem or not with any hosts. Done!!

If all this seems confusing grab my sample spreadsheet here:

You can then replace the data with your own and expand the formulas right down as far as you need to go based on the 1 host example I've shown. 

Hope this helps. Bit of a pain at first but at least you can be assured that all your vmnics and uplinks are lined up correctly when deploying 100's of hosts!!