VMware Snapshots: 2017

Thursday, 7 December 2017

Browser errors accessing VMware web clients

I've been hitting a strange error when using my Lab at home that I'd thought I'd share here. IPMI and the VCSA appliance interfaces both work but the ESXi UI and vCenter web interface all fail on multiple browsers:

Here is the result "Secure Connection Failed" from Firefox:

Here is the result "This page isn't working" from Chrome:

This was confusing the heck out of me. I checked my Hosts file was pointing at the right locations, that the URLs were good - different browsers didn't matter. I thought perhaps some older encryption method wasn't supported in my browsers anymore but then tried disabling my antivirus. I'm using Bitdefender and after disabling it the URLs started working. Until today I couldn't figure out how to add exceptions/exclusions so there is how I did that:

I added in the vCenter URL and IP of the ESXi host and straight away was fine again.

We've recently seen an issue where a Flash issue stopped IE working but this was slightly different. This might help someone else out there stumbling on the same thing!

Wednesday, 6 September 2017

Migrating to VCSA 6.5 U1

Migration to VCSA 6.5 U1

I've been working with a customer migrating vCenter 5.5 U3 to VCSA 6.5 U1. They had a lab to test against and the upgrade there went fine but when we got to the first Production vCenter it turned out to be a very different story!

I'm writing this post to gather my thoughts and offer tips that will help you when it comes to your turn!!

The first stumbling block was that the vCenter SSL certs had expired. These were the ones replacing the original self signed ones. We tried a number of methods but ran into problems re-registering the inventory service with SSO. Time to reinstall and keep the old database. We backed up the license keys and permissions just in case and uninstalled ALL the vCenter components. The simple install failed. The advanced install also failed on the SSO. It just kept rolling back. We logged a call with support and they guided us through cleaning out the %temp% folder as it turns out VMware likes to reuse the crap that's in there. We then install the Pre-Req's by hand - see KB2059481 for info. There were also 2 x CIS folders to delete, see the same KB. Once we'd done that SSO installed fine and we ran though the other services up to Update Manager. Oh, how I wished we'd stopped before that though!!

We ended up re-running the migration EIGHT times in total at 1-2 hours a pop before we finally got it to work all the way through. One of THOSE migrations!! The first issue was when we could see it zipping up some vum files on the old vCenter where the migration assistant runs. Shortly after that it would fail with "The compressed (zipped) folder is invalid or corrupted". A google search found lots of others with similar issues but no relevant KB article. The guidance was uninstall Update Manager and take the database offline and try again. That didn't work. During the next migration it still found & zipped Update Manager components from somewhere despite it being uninstalled and the database no longer being online!! Turns out the migration assistant works off another directory and keeps the Update Manager components there and in future attempts grab the old files and tries to use them where it fails....again!!!! Crapola.

Here's the process we worked out to get you the best chance of pulling off the migration. Repeat these steps between each migration attempt as necessary.

Uninstall Update Manager, get rid of it - take notes or whatever but seriously, kill the damn thing
Take the database offline
Delete all files in the %temp% folder - go up a level if it drops you into %temp%/2 or somewhere like that - the users temp folder would be somewhere like "C:\Users\Bozo\AppData\Local\Temp" - remove everything BELOW this directory
Delete the C:\Users\Michael\AppData\VMware folder and everything below it - this is where the migration-assistant folder lives and its contents are going to screw you everytime until you castrate 'em!
Create a local admin account with admin privs to get you out of a hole if you've to rollback as the server's computer account is transferred to the VCSA and you will have to rejoin AD. If you know your local admin account password, great, but TEST it!! Or you'll get locked out. Good luck with that.
You need a VCSA port group that is configured for ephemeral ports, get this ready or you won't see any valid port groups listed in the migration wizard!
Configure the security settings "MAC Address Changes" and "Forged Transmits" to Accept on this Port Group. You can move the VM to another port group once the migration is finished but if you run the migration process enough times your feckin' VCSA network port will end up BLOCKED as it keeps appearing with a new MAC addresses each time you try and if the port gets blocked, guess what, you've to start all over again as it fails the migration process!!
Disable HA/DRS on your clusters temporarily if you can
Expect to sail through on your first attempt or face hours and hours of pain and frustration, it's gonna go one way or the other!

If you reinstall vCenter like we did, once you remove everything from add/remove programs, probably a good idea to remove VMware folders from both program files folders and programdata for good measure. You'll need to reconnect your ESXi Hosts after the reinstall, there's a way to script this out there somewhere.

You'll find once this works (the migration) your hosts are still connected and your vCenter license is set to expire 59 days later so get to that. Check your SSO Admin password expiry is appropriate. Move onto your next vCenter and hope it gets easier!! Best of luck and did I mentioned that if the starting PortGres database message takes a few hours, just wait a bit longer!! Good Luck!!

Sunday, 20 August 2017

HPE Blade Host not Booting

I was building a Blade Chassis this week and after the OneView profile was applied and I installed the ESXi 6.5 OS I encountered a strange issue upon the first reboot. The Server was stuck just at the initial Bios check stage with Code 0700 and then stopped. Forcing a cold boot got the ESXi server up and running but a further warm reboot produced the same issue. I'd just finished building a few DL360 Gen9 management servers and they weren't having this issue so what was wrong?

After checking the Bios settings in the Profile and testing a small change with no result I hit Goggle with the search "hpe blade dual sd usb warm boot" and found the following advisory:

http://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7271227&docLocale=en_US&docId=emr_na-a00016609en_us

The Blades were using dual microSD cards on an internal USB device, the DL360's were using local 600GB disk drives. The fix is a firmware update you can apply via the ESXi command line interface. Sure enough, applied to a host and rebooted and it worked fine.

Note you should check other advisories on your hardware from time to time:

http://h20565.www2.hpe.com/portal/site/hpsc/public/psi/advisoriesResults?sp4ts.oid=7271227

Like this one referring to the same device that stops the server with an All Paths Down error between 17 to 30 days of runtime affecting particular part numbers:
http://h20565.www2.hpe.com/portal/site/hpsc/public/psi/advisoriesResults?sp4ts.oid=7271227

It's not always evident so watch out and safe hunting!

Thursday, 4 May 2017

HPE Recovery Manager Central

This post deals with installing RMC-V which is a vCenter driven integration for backups using HPE Storage. It leverages snapshots as a backup resource allowing VMware admins to recover individual VMDK disks or VMs directly from the storage array snapshot.

The product is on version 4.0.0 currently and requires licensing which is tied to the serial number of the array. Once you extract the big Zip you'll see three subfolders:

We'll go into the VMware Image and fire up the exe below:

HPE_RMC_Installer_4.0.0-40.exe is one we want. Then we're a lot of install screens to go through:

So, there's a licensing error to resolve. First I rebooted my vCenter Appliance as there was no way I could find in the admin console to just restart the web interface!! RMC has it's own web interface as shown below:

I was able to fix a StoreVirtual Management Cluster issue by removing it and adding the IP Address of a NODE from each StoreVirtual Management Group. It says they are "Not Licensed" at this point:

I logged into the old web client and saw the plugin registered but still initializing:

It was ready shortly after:

Here are two menu paths that expose the new options from the plugin:

This is via a Datastore or VM. There is no integration with the newer HTML 5 UI yet, only the old web client.
So, the licensing for StoreVirtual is not done in vCenter or RMC but back in the CMC for StoreVirtual:

If I try to take a snapshot I'll get this:

So, I eventually got my license key sorted out and could play with the software. I can do everything from within vCenter Web Interface ok (the old one) but there are caveats worth remembering:

You can create a Recovery Set at the VM or Datastore level. You can retores VMs from either but only the Datastore Recovery Set lets you Restore the entire Datastore.
Primarily you will be taking scheduled Recovery Sets at an agreed frequency and throwing them away after a set retention time.
RMC without StoreOnce is still half a solution but if you get the RMC license for free as part of your new 3PAR I'd throw it in there for sure.
Your primary use for RMC-V is mounting a selected Recovery Set Snapshot to copy out a particular VMDK Disk and overwrite a corrupt one, or to mount it to a different VM and get at particular files.
Recovering a VM is done by deleting the old one, attaching the VM from the snapshot to the vCenter inventory and carrying out a storage vMotion. Don't see a way to do this directly with the RMC interface.

The error you get when trying to restore an entire VM Recovery Set is as follows:

Use Datastore Recovery Sets if this is your intention. You can get at everything from those anyway and it's the same volume snap on the storage anyway.

You have to Mount a Recovery Set before you can choose the Copy/Attach/Detach options:

Note: Go to "Global inventory lists" on the main vCenter menu, then select Recovery Sets near the bottom. All the restore operations except for the full Datastore restore are triggered from here.

Once mounted you get the option to Copy:

Or Attach:

The Recovery Set process creates a VM snapshot, then the storage snapshot before removing the VM snapshot. Be vigilant that the temporary snapshots are cleaned up as trying to restore a Datastore Recovery Set on top of Snapshots could leave your VM files corrupted or at least not straightforward. A VMware alert might be appropriate for snapshot growth over a particular size to warn you.

Here is a manual Datastore Recovery Set process:

You can see the snapshot below while this is being generated. This should be removed once the operation completes:

You then start to build up a choice of snapshots:

You can select any of these to restore from.

Warning: If your Datastore has a high rate of churn, does your storage array have sufficient space to cope? Monitor the space used by the snapshots closely until you get through a few cycles of a full schedule / retention / deletion to make sure. Last thing you want is to run out of Array space. Bring in a HPE partner / HPE on this. Snapshots are useful but can cause serious issues in both VMware and your Storage if badly implemented or managed.

To restore an entire Datastore use the Datastore View under "More Objects":

Well, that should give you an intro into the RMC-V product. Left to it's own devices it should provide additional recovery options for vCenter Admins right from their web interface. You can also use it to set up Remote Copy array replication but not ones using Peer Persistence. You may wish to limit the permissions on the 3PAR account to save the Storage admin some sleepless nights but this type of close integration is becoming common as HPE has many other plugins and integrations with VMware for this purpose. VVOLs aren't supported yet. To get the best from RMV, consider adding StoreOnce to the mix and leverage the Express Backup menu options you'll see above. They provide a demo OVF for StoreOnce for you to try out. You will need a virtual fibre adapter for the backup path so read the documentation and work with a trusted advisor before heading down that path.

Friday, 21 April 2017

Azure - Pricing

This post is a quick one on Azure Pricing. There's an Azure Pricing Calculator that helps you estimate the bill for a new VM. One element that was confusing me was the disk size included in the VM. What the heck was this? Free storage? The OS Disk, Data disk, what?!! See the 382GB below!

I ordered a few VMs sizes which came with different sized disks and from what I can figure out it related to the size of the TEMP disk is all. The huge VM above has a 384GB temp disk!! Haha. See below from the RDP session I checked it with:

So, while it's nice, it's not good for anything. The disk sizes don't even correlate to RAM:

Beats me why Microsoft are giving away temp disk storage?! The temp disk is wiped on a reboot if you don't know! Or if the Host fails. Or if you resize the VM. Or if you shutdown the VM......you get the idea?!!
Now this is for Windows. It's slightly different for Linux. But in any case, if you're working out pricing for a customer, this might help explain things that don't make sense to you!!
The price is included with the OS disk in the base VM price, so you add on any additional Data Disks after that....

Hope this helps someone who is similarly confused!!

Friday, 14 April 2017

Azure: Create Managed Disk from a Snapshot

So, today I was looking at Managed Disks and using one to store SQL backups from inside a VM. Then I looked at backups but I'd prefer not to backup the whole VM just to capture the SQL bak file. I could set a weekly backup job to capture the OS and Application, but how to get a daily backup of the SQL BAK files?

I'm using Managed Disks and while you can create a snapshot through the web interface, you can't do anything with the snapshots currently, except list them!

So, I read over and over how it was possible to create a Managed Disk using a snapshot. Great I thought, but how?!! So I got this to work with the new CLI 2.0 as follows:

Uninstall any old Python versions on windows. Grab the latest 64 bit version from the python website.
https://www.python.org/downloads/windows/
Install and choose to add it to your path. After I did this I ran into an issue installing azure-cli. This is when I found an old version of pip & python - unintalled this and the new Phython and reinstalled the latest 64 bit one and the azure-cli installed fine
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
Now it wouldn't invoke the command "az". I'd to add the following to my path in windows 10:
"C:\Users\<username>\AppData\Roaming\Python\Python36\Scripts"
Now I could invoke az and log into Azure using "az login"
After that I created a managed disk from one of the snapshots I'd previously created:
https://azure.microsoft.com/en-us/blog/azure-cli-managed-disks/

az disk create -n myDisk -g MYPRICINGRG --source /subscriptions/<------------>/resourceGroups/MYPRICINGRG/providers/Microsoft.Compute/snapshots/140417

You can get the source by copying the RESOURCE ID from the screen below.

You can add snapshots to your Azure service bar on the left for handy retrieval.

Now you can attach this new managed disk to your SQL or other VM as an additional disk and away you go!

Looks like CLI 2.0 is becoming the defacto standard for Azure?

Monday, 13 March 2017

VCSA HA Stuff

There's already loads of great posts out there on the VCSA 6.5 HA option but I wanted to cover three things here:

DNS issues with vCenter HA setup
Failover Times
vCenter HA Footprint

When I tried to deploy HA for my Lab VCSA it gave me an error message:
"Failed to get management network information. Verify if management interface (NIC0) is configured correctly and is reachable"

Check this Forum post for a fix:
https://communities.vmware.com/thread/547117?start=0&tstart=0

Basically you should ensure your vCenter name and DNS records are in the SAME CASE. I had my vCenter called "Labvc.lab.local" so I'd to edit /etc/vmware/systemname_info.json on the VCSA appliance to update the name to "labvc.lab.local" and then delete / recreate my DNS records to match the same case. After rebooting the VCSA I was able to deploy HA. Not critical but annoying!
Note: You will need three Hosts in your Lab (no less)!

I set up a vCenter HA configuration (terminology is a bit confusing with VMware HA?!) using three nested ESXi servers and each having local storage and an extra port group setup on the default standard switch.

From boot my Lab VCSA VM takes 5 minutes to boot until the old Web Client has initialized and is ready for logging on.

The vCenter HA Failover feature when initiated manually takes 6.5 minutes to perform a failover until the web interfaces are ready for logging on. (The newer UI is ready 20 seconds or so sooner than the older flash based legacy web interface).

So, would I use traditional VMware HA to recover vCenter or deploy this vCenter HA to perform a service failover instead? With an embedded PSC, vCenter HA has it's merits, it's only three VMs. If you are using it for VDI or other heavily vCenter dependant services it could be of use.
Once you get to an external PSC and load balancers, I think it's too complicated. Maybe with the next version of vCenter VCSA it might improve but 7 VMs for vCenter:

2 x PSCs
2 x Load Balancers
3 x vCenter VMs (A/P/W)

The old VMware HA to protect against host failures still delivers and can recover a VCSA in less time. Postgres corruption will still cause both VCSAs Active & Passive, to fail. A good VM backup strategy while using the inbuilt VCSA backup option should provide sufficient recovery options for all scenarios. If you need to go beyond this then vCenter HA is the next obvious choice but you had better have a load balancer or three handy!vCenter HA with two PSCs using manual repointing of vCenter still presents potential downtime between discover and remediation.

As for footprint, this is for a new build with no data or significant inventory:

So, some things to think about anyway. It's always good to have options and it's better / cheaper than the previous Heartbeat solution. If you're using Enhanced Linked Mode / have scaling requirements, then the number of VMs makes me think twice.....

These are just my thoughts, so evaluate for yourself and your environment before coming to any conclusions!