Monday, 5 October 2015

Installing HP Cloudsystem 9.0 - Part 3

Installing HP Cloudsystem 9.0 - Part 3


This Post covers the high level steps of how CS 9 is deployed, I've done it more than a few times now before I got it working in my Lab so this post will also cover how I cracked it.

The Steps to deploying CS9 at a high level are as follows:

  • Run the csstartgui.bat to launch an internal web browser
  • Enter in the information required to connect to vCenter and stand up the first Management Appliance
  • Upload the MySQL_Connector_J_JDBC_driver to 6 folders on the Management Appliance and perform permission changes
  • Browse to the new Management Appliance Web Interface and enter in all the configuration information used to build out the rest of CS9
  • Click Begin Installation - there's no turning back now! 
  • Monitor the install by tailing the cs-avm-manager.log file and watching vCenter spin up 12 more appliances (in a typical deployment, there can be more!)
  • Edit each of the 9 initial appliances to knock down their vCPU from 8 to 4
  • Perform a few post install checks and one or two more configuration steps
  • You're done! Add a vCenter, activate the Compute Cluster and deploy your VMs
You should end up with the following:

This comes to just over 92 vCPU, 256GB Ram & a 300GB Disk footprint for all 13 Appliances. Feeling the pain yet?! Memory isn't as much an issue as it settles down after deployment:

Disk Latency went to 3MS a few times and once to 10MS so that's not a problem. 

CPU however was a massive issue for me

This is showing the deployment of the 12 remaining appliances from 1:27pm. CPU pegged at 100% a few times for short intervals but what was more worrying was the CO-STOP value which I've never seen as high! 2355 milliseconds, for a CPU that's death!! BUT....it worked, I got my deployment. Now I'm NOT running 92 vCPU, I'm running 50, and I think I could have shaved that down further. I kept running into deployment failures, mainly caused by using a password with a "!" in it.

Once I figured that out the 9 biggest appliances were stood up but would stall with a failed Deployment message. I was hoping for a way to NOT deploy a HA, (3 of everything) but the best I could come up with was to intercept the appliances as they were being built and change their vCPU configuration back from 8 vCPU to 4 vCPU. And then power them on quickly before anything noticed! My config is as follows:

Appliance
Template Size
Cloned Size My Target Size
Management 1
4 vCPU, 4GB
8 vCPU 16GB 4 vCPU 16GB
Management 2 & 3
4 vCPU, 4GB
8 vCPU 8GB 4 vCPU 8GB
Cloud Controller 4 vCPU, 4GB 8 vCPU 32GB 4 vCPU 32GB
Enterprise 4 vCPU, 4GB 8 vCPU 32GB 4 vCPU 32GB
Update 4 vCPU, 4GB 2 vCPU 8GB 2 vCPU 8GB
Monitor 4 vCPU, 4GB 4 vCPU 8GB 4 vCPU* 8GB
*Could be reduced further?

So, you need to catch each appliance as it's built

The example above is the third Management Appliance, make sure you've gone to the toilet (!) as you'll need to babysit things for 30-45 minutes to capture all the 8 core large Appliances. As soon as you see the Clone power on, power it off using the old C# client connected directly to vCenter and then edit the number of vCPU from 8 down to 4 and Power On the Clone again. 

So, my lab has an E5-2618L v3 CPU which as 8 Cores @ 2.3 GHz with Turbo up to 3.4 GHz. This was pushed pretty hard and a second CPU wouldn't have gone amiss. The 128GB of Ram performed beautifully and I didn't see any pressure there - could have done it with 96GB I think. I've SSD only so Storage wasn't a problem from a performance or space perspective (It's a 1TB Samsung EVO 850 SSD). So if your lab is less than this you're going to have to watch CPU most carefully followed by memory & disk. It may be possible to edit the memory also to reduce that footprint. I certainly think the monitoring appliances could do with less vCPU but I've not tested any further reductions at this point. 
The Graph above shows the memory allocation. The Peaks are each of the 32GB appliances starting up but as you can see it quickly trails off. The most critical CPU actions happen once the 9 appliances are stood up and once the last of the 13 appliances are finished and the final configuration is applied. 

My Lab:
Supermicro MBD-X10DRI-T-O (Dual Socket, 1 populated)
Intel E5-2618L v3 (8 Core x 2.4GHz)
128GB DDR4 Ram
LSI Megaraid 9271-8i
Samsung 1TB 850 EVO SSD
Cisco SG300-10 12 port switch with layer 3 support