Friday 4 October 2013

Securing vCenter to SQL communication - event id 36870

Hi,

Well, I'm rebuilding my Lab currently using Server 2012. This gives me a chance to use SQL 2012 and vCenter 5.5 (Note to self: 2012 R2 is fine EXCEPT for vCenter! It gets stuck at "Installing Directory Services" and is not supported anyway). One thing I'd read about was configuring SQL communication encryption (as opposed to Database encryption) and I wanted to use that. I held off installing vCenter until I could get the certificate installed and SQL services running but ran into a roadblock every time I applied the Certificate and started the SQL Server Service:

I thought it was my Certificate but I'd generated it with Openssl and the AD certificate services just like the other VMware Certs I'd deployed previously. I came back at the issue a week later and found this post which solved my problem:
http://sqlblog.com/blogs/greg_low/archive/2013/05/30/sql-server-service-won-t-start-after-changing-service-account-service-specific-error-2146885628.aspx
Now I'm no SQL guru but a permission problem wasn't what I expected. What I did remember was I tried using Managed Service Accounts to test how these work and that was the account the SQL server service was using and subsequently didn't have sufficient permissions.
Here is SQL configuration manager showing the accounts I'm running the services under:
And here is the Certificate I had previously selected:
So once I had changed the permissions on the folder C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys I could start the SQL server service just fine. I did get an error applying permissions for the msvc_Labsql55 account at the folder level, I had 2 files in the folder, one would not let me edit permissions but the other accepted the update fine. I think one of the files corresponds to my SSL cert but the other is a system key and shouldn't be meddled with! Test in a Lab yourself to make sure!
So now I can create an ODBC connection for the vCenter 5.5 Database and tick the box to encrypt the connection:
 
And it tests successfully:
Job Done! Now I just have to get building!
 
 
 
 
 




Saturday 14 September 2013

SRM Lab - No Array Pair in Protection Group

Well, this was a weird one, I spent a day labbing an SRM configuration and after fighting my way through certificates and domain controller issues I ended up not being able to create a protection group. This taken from the SRM Logs:

2013-09-06T13:30:09.385+01:00 [05420 warning 'DatastoreGroupManager' opID=33174abf] No read-write devices found for array pair 'array-pair-2038'
2013-09-06T13:30:09.385+01:00 [05420 verbose 'DatastoreGroupManager' opID=33174abf] Recomputed datastore groups for array pair 'array-pair-2038': 0 replicated datastores, 0 replicated RDMs, 0 free devices, 0 datastore groups

It's just not detecting anything it can work with even though everything looks fine and is replicating at the VSA end...!

I got talking to a great guy in HP Support about it and he came back and said to check the IQN for upper case characters. In the HP StoreVirtual Centralized Management Console I had added the VSA's using upper case characters and this was reflected in the IQN seen in VMware.

I unexported the volumes, deleted the servers in the HP console and recreated them, refreshed the ESX iSCSI config and rebooted the nested ESXi hosts for good measure. The screen shots below show what the configuration I got working looks like and how it SHOULD have been configured!

 
There should be no upper case in the Name shown below:
 
 
Then after rescanning the SRA and Devices I was able to create my first protection group (My issue was I was shown NO array pairs to choose from):


So that's one I'll remember! Thanks to HP Support for spotting my mistake!

Mike

Friday 6 September 2013

Upgrading DC to 2012 throws VMware SSO a loop

Hi Folks,

Well, I decided to prep my Lab for vSphere 5.5 and get server 2012 and sql 2012 out there so I can have fun figuring out where everything is moved to! I just started with my single Lab DC which was on Server 2008 R2 and installed a second DC with Server 2012. Darn, I just remembered 2012 R2 is around the corner, could've waited, oh well! I know there are some weird things about changing the name of your DC affecting vCenter and you've to edit the database but I wanted to basically replace my Lab DC with a new one and end up keeping the same IP and Name.

Note: None of this may be supported so follow at your own risk, this is my Lab, not a Production environment!

My problem was when I booted up my full lab and the new DC was running away fine with the old name and IP, the vCenter Service on my two linked mode vCenter VMs would not start. The vpxd.log showed the following errors:

2013-09-06T09:11:58.472+01:00 [04896 warning 'Default'] Warning, existence of group "LAB\Domain Admins" unknown, permission may not be effective until it is resolved.
2013-09-06T09:11:58.534+01:00 [04896 error 'Default'] The group account "LAB\Domain Admins" could not be successfully resolved.  Check network connectivity to domain controllers and domain membership.  Users may not be able to log in until connectivity is restored.

I also got netlogon errors in the system event logs. To start with I thought the vCenter VMs were having some difficulty talking to the new DC even though it is using the same name and IP address. I use netdom to rejoin them to the domain:

netdom.exe remove IEDUBDC1VC01 /Domain:lab.local /userd:lab\administrator /passwordd:XXXXXXX

netdom.exe join IEDUBDC1VC01 /Domain:lab.local /userd:lab\administrator /passwordd:XXXXXXX

I was able to enter both commands one after the other and then reboot saving some time.
This fixed the logon errors but the vCenter service still had the same issue.

So maybe SSO was the issue - the errors above were wrapped by SSO entries:
[SSO][SsoAdminFacadeImpl]'] [FindGroup]

I found a great post here that would deal with the issue:
http://www.gabesvirtualworld.com/vcenter-sso-changes-when-demoting-domain-controller/
except when I went to launch the web client this is what I got:
 
Catch 22?!! Looked like it. I knew SSO had command line options but I'd never used them. I was able to list the current SSO configuration and from there tried removing and re-adding my Domain source, once I'd done that vCenter service started up fine. Here are the commands:
Note: the account to use appears to be just "admin", I'd tried admin@System-Domain and other variations before I figured this out!
 
This lists the current configuration:
 
C:\Program Files\VMware\Infrastructure\SSOServer\utils>rsautil manage-identity-sources -a list
Super Administrator's name: admin
Super Administrator's Password: ********
External identity sources:
Name: lab.local
ID: ims.8dd6788d2b0aa8c05a7226e3edcef61b
Type: Active Directory
Primary URL: ldaps://LABDC.lab.local:3269
Failover URL:
Domain: lab.local
Alias: LAB
Principal Base DN: DC=lab,DC=local
Group Base DN: DC=lab,DC=local
Successfully executed action: 'list'
 
This removes the current configuration, copy and paste the ID from the output above:
 
C:\Program Files\VMware\Infrastructure\SSOServer\utils>rsautil manage-identity-sources -a delete
Super Administrator's name: admin
Super Administrator's Password: ********
ID of the identity source to delete. Use list action to retrieve the ID: ims.8dd
6788d2b0aa8c05a7226e3edcef61b
Successfully executed action: 'delete'
 
This one adds back the Domain information, I've used the domain rather than specify a particular domain controller:
 
C:\Program Files\VMware\Infrastructure\SSOServer\utils>rsautil manage-identity-sources -a create
Super Administrator's name: admin
Super Administrator's Password: ********
Primary URL: ldap://lab.local
Secondary failover URL:
Domain name: lab.local
Domain alias: LAB
Base DN for users: cn=Users,dc=lab,dc=local
Base DN for groups: cn=Users,dc=lab,dc=local
LDAP administrative account user (read operations): lab\administrator
LDAP administrative account password: ********
Identity Source added. ID: ims.32fa4eb12b0aa8c0073ec51929b46c33
Successfully executed action: 'create'
 
This confirms the new settings:
 
C:\Program Files\VMware\Infrastructure\SSOServer\utils>rsautil manage-identity-sources -a list
Super Administrator's name: admin
Super Administrator's Password: ********
External identity sources:
Name: lab.local
ID: ims.32fa4eb12b0aa8c0073ec51929b46c33
Type: Active Directory
Primary URL: ldap://lab.local
Failover URL:
Domain: lab.local
Alias: LAB
Principal Base DN: cn=Users,dc=lab,dc=local
Group Base DN: cn=Users,dc=lab,dc=local
Successfully executed action: 'list'
 
vCenter functioned perfectly at this point and the service loaded up fine. I had to manually reselect the Update Manager service account and re-enter the password but then that service worked fine also, even before the SSO steps above.
 
I'm looking forward to the new overhaul of SSO in 5.5, hoping it makes life a little easier but we'll see! This might help those of you stuck with a Wintel team changing AD on you without notification and the next time you bounce vCenter you're stuck! Make sure you warn them about how VMware components are tied to AD intimately so they give you plenty of time to plan ahead and test after the upgrade.
 



Thursday 20 June 2013

Snapshot Conundrum
VMware have a KB article dealing with understanding Snapshots:
In it they provide a diagram that shows how a VM with 3 snapshots against it operates. I've copied the diagram below.

I just completed reading a VMware book and they made a statement which tweaked my curiosity. "If you have three snapshots, any new data is written to all three". This conflicted with what I had learned and assumed from my VCP days. The diagram above appears to support this statement (apart from the VM write to the Parent Disk which the Authors have issued an errata for) but the KB article itself makes no similar assertion.

My interest was around performance. If I create multiple snapshots how much does it degrade performance? I understand that Reads may have to traverse the whole string of snapshots back to the Parent disk to retrieve a file and ensure its the most up to date version, but writes?

My own understanding was Writes only happen to the active Snapshot disk, the others are essentially frozen. Being confused by this contradiction I wondered if it would be possible to test this is my lab? If I could use ESXTOP to view the disk statistics for the parent file and each snapshot disk I could tell straightaway where the reads and writes were occurring. Easy? Goes to show I need to use ESXTOP more!

The disk statistics in VMware's ESXTOP aggregates all disks for the VM under one heading. Even if stored on different Datastores. This means I can't see "inside" each disk file to tell where the writes are occurring based on performance counters in ESXi. I can see individual disk latency in the ViClient but nothing more granular than that.
So, maybe we can use the modified date & timestamp and view the disk size after a change to determine where the writes are happening. We can view the directory in Putty and use the "ls -alh" to see what the directory looks like for a single VM with one virtual disk and 3 snapshots against it.

This is the VM's folder view in Putty with no snapshots, our baseline:
This is the VM's folder view in Putty after 3 Snapshots have been created, a few minutes apart:
Now, we're interested in which disk gets modified and grows if we drop a file into the VM. I'm going to copy the vSphere Client, a 110MB file onto the C: drive and run the ls -alh command again to see what happens. We will either see the 000003 disk modified time change and the size grow accordingly, or we'll see all three snapshot disks change the same way. The results are as follows:
And the result? Only the most recent Snapshot disk in the chain is affected by the file copy. The Modified Time and Size adjust appropriately but there is no associated change to the previous snapshots in the chain. The VM Parent Disk "Snapshot-flat.vmdk" is also unaffected. 

[Edit 21/6/13: Note the Snapshot file grows in 16MB chunks, hence the size is multiples of 16MB and may be larger than the file copied into the VM - Thanks to VMware's Rick Blyths Post on Snapshots for that nugget!]

A ViClient Datastore Browser view of the same thing is here:

I must admit I love reading something which challenges your preconceptions and motivates you to investigate. I found the rest of the book a great read, almost too short as the case studies and experiences relayed are rarely found outside Blogs and Live events. As writing a book and getting it past editors and published is a significant achievement I've nothing but respect for the Authors. I'm going to deliberately hold off naming anyone as my interest is from a technology perspective but if the Authors leave a comment on my Blog I'm more than happy to publish it.

So, that's it for my very first blog post! I'll be posting again as soon as I find interesting topics to share. If you have any suggestions on the article above, can suggest alternative tests or spot obvious faults in my methodology please let me know. I'm still learning!

Mike

[Update: 21/6/13] Well, I found a way to monitor which snapshot files are being touched when a write occurs. How did I do it? Windows Server 2012! I enabled NFS and was able to view the contents of the folder with Windows Explorer and, more importantly, view the individual file writes with Resource Monitor. I copied the ViClient as before and then copied the vCenter install folder (1.2GB) to get a longer reading and captured the results. This is the Windows Folder "E:\NFS\Snapshot" Explorer view after both file copies:
This is the output from Windows Resource Monitor, Disk View, scoped to the System process. The NFS Root Share was created in E:\NFS and the VM name was called "Snapshot" as shown in the path below:
 
So that's it, Myth-Busted as they say! There are NO writes to the Parent or first two snapshot files, only the most recent Snapshot file (000003). If I do a command line directory read of the Windows Folder files & subdirectories (dir /s) which should all come from the parent I get the following result:
The Read traverses through the third and first snapshots before hitting the Parent disk where the files are. So there is a read overhead but no write overhead. Again, the only writes occur on the 00003 Snapshot disk. 

Note: I initially tried Windows 2012 iSCSI first but it uses a VHD file which you can't get inside with Resource Monitor. It had been a while since I'd attempted to use UNIX services for Windows, and I'm glad its come a long way since then! Now that's over I can get some sleep!