Thursday 1 October 2015

Installing HP Cloudsystem 9.0 - Part 1

Installing HP Cloudsystem 9.0 - Part 1


Well, here we are, version 9.0 is just out and I've an upgraded the hardware in my Lab system that will (hopefully!) take this on!

The various downloads mean I hope you have good broadband at home or you can leave your laptop in work over the weekend! Once you get hold of them you need to upload the various OVF files until you get something like the following in vCenter:

Ignore my windows images, the ones you want begin with "cs". Keep the template name unchanged, you can see how much space you need via Thin Provisioning (which they recommend as it uploads faster), about 23.87GB in fact!

The basic overview of steps to get CloudSystem 9.0 up and running is as follows:
(From Top to Bottom):



  1. Extract the Tools Zip file, edit the csstartgui.bat and then execute it
  2. Fill out the CS Management Appliance Installer
  3. When the Master Management Appliance is deployed add a MySql file to 6 folders
  4. Fill out the CS 9.0 First-Time Installer
  5. Wait for the additional 12 Appliances to spool up and configure themselves
  6. Perform a few small post install checks and configurations & Activate the Compute Cluster
The whole process will take 1-2 hours for an experienced person on Production Hardware. Allow double that for your first time on Lab Hardware. You should also to have done some planning for the various Networks and other choices. My Posts here should give you a start but you may need to tweak things in your own environment. Again, this is a Lab setup I'm describing so it may help with a POC but consult with a trusted advisor before taking things further. 

Now, deployed with defaults will need 92 vCPU, 256 GB Ram and just over 300GB Disk Space (This is the Thin Provisioned initial Footprint). If your hardware can't cope with that you're not completely out of luck but you'll have a fight on your hands to get it to work. Assuming you have the disk space, I've found memory usage peaks at around 110GB but the max active was 85GB. The biggest issue for me was vCPU. With 6 x 8vCPU VMs running my CO-STOP was crazy high, this indicates vSphere is finding it very hard to get 8 free vCPU at once to schedule an instruction. The more 8 vCPU VMs you put on there the worse it gets!

My Minimum Lab recommendation is as follows:
1 x 8 Core Cpu, More Cores or a second CPU preferably!
96GB Ram, 128GB preferably
350GB Disk, 500GB preferably

My Lab is described in a future Part of this series for comparison. This kind of hardware is not cheap and exceeds what I need for any other solution but to try this at home there is no option to avoid HA and you get 3 of each of the main appliances as a result.

The rest of this post picks up after you've uploaded those templates. Let's kick off the Master Management Appliance installer:

The zip file HP_Helion_CloudSystem_Tools_9.0_Sept_2015_Z7550-96140 contains the file "windows-csstart.zip" in it - extract this to a folder and then extract the contents. Open a command prompt as administrator (just in case) and edit the csstartgui.bat file as follows:
Original:
REM (C) Copyright 2015 HP Development Company, L.P.csstart gui --start-browser
Change to this:
csstart gui --start-browser
Yes, someone used an editor I guess that doesn't tell windows to go to a new line! You can just type in the command also directly into the command prompt after changing to the same directory - csstart is just an EXE after all so no big deal.

You may see a warning:
WARNING: file already exists but should not: C:\Users\DarthV\AppData\Local\Temp\_MEI90642\Include\pyconfig.h
You can ignore this, I saw this on Windows 10 and Windows 8.1 so I'm not worried, it doesn't exist outside of the install so it sounds like a bit of code looping back on itself. 

Now you'll hopefully get your default browser open with a link to "http://localhost:5000/main" and see the following:

So where did I get all this?! Well we need to look at our vCenter configuration and ensure when we click "Install" it doesn't throw everything back in our faces! There firstly is a network design to understand and then you need to configure your Distributed vSwitch on the Management Cluster. So let's pause the install and work on those elements. 

I must note the PDFs are worth a read, I found I could get through them easily and they actually made sense! Kudo's to the Cloud Team for them as I've see other manuals (won't say where) that just dump the help file! They have a whole planning guide but this Blog is a real quick a dirty stand up for a Lab so I'll be taking liberal skips and jumps so watch out! 

Note: the default 8 vCPU for this appliance needs to be avoided. This is covered briefly at the bottom of this post and expanded in a later post from this series. We're aiming for 4 vCPU max on each appliance. 

Now, if you have problems you can track the deploy.log file in the same folder where you launched csstart from. Also you can now use Putty to SSH (This is a vast improvement on 8.1!!!) to the new appliance, just use the following credentials:
username: cloudadmin
password: cloudadmin
sudo -i
cat /var/log/cs-avm-manager/cs-avm-manager.log
cat /var/log/pavmms/pavmms_api.log
Once finished you browse to the new appliance using http and continue the process from there.

Errors encountered:

I found the following error in my Deploy.log when using a Static IP Address:
[2015-09-18 13:31:43,184] DEBUG    csstart.get_keystone_token about to exec command: 'curl' '-k' '--silent' '--show-error' '-X' 'GET' '-H' 'X-Auth-Token: ********' '-H' 'User-Agent: csstart' 'http://ma1.hpiscmgmt.local:6666/rest/pavmms/v1.0/cs-mgmt1'
[2015-09-18 13:31:53,335] DEBUG    csstart.get_keystone_token stdout: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request.  Either the server is overloaded or there is an error in the application.</p>

I've been told this can be ignored for now.

I also tested using DHCP:
[2015-09-18 13:32:08,454] INFO     start Timed out waiting for appliance upstart to finish.
This may be normal in DHCP environments in which DNS is not configured to auto-register the node. Check the csstart deploy.log file and appliance logs for any errors, then ssh to the management appliance and manually check that upstart has completed successfully.

This also appears normal and can be ignored, I prefer to use a Static address until I understand things a bit better so I've carried on with the settings used in the screenshot above. A Window opens up which tracks the Deploy.log file. See sample output below (you wait for the error mentioned above which means a 10 minute timeout):

Sample Output:

The VM name will be: cs-mgmt1
Config file - passed basic tests, moving to advanced tests.
Trying to connect to Vcenter Server labvc.lab.local ...
Starting new HTTPS connection (1): labvc.lab.local
Connected to Vcenter Server labvc.lab.local
Default load balancing mode is loadbalance_loadbased
DCM network will be configured on DSwitch01 with VLAN 0
Management trunk has been provisioned successfully!
Creating the Management Appliance.
Creating clone req
Appliance (cs-mgmt1) successfully reconfigured
Dryrun is False
Starting new HTTPS connection (1): labvc.lab.local
One or both of cert and key filepaths is None - not sending SSL files.
Starting the appliance.
This step could take between 1 and 3 minutes to complete.
CloudSystem Management Appliance was started successfully.
Waiting for appliance to finish upstart.
This step could take up to 10 minutes to complete.
Timed out waiting for appliance upstart to finish. This may be normal in DHCP environments in which DNS is not configured to auto-register the node. Check the csstart deploy.log file and appliance logs for any errors, then ssh to the management appliance and manually check that upstart has completed successfully.

Continue the CloudSystem 9 setup by browsing to http://192.168.10.50/

The Cloud Management and DC Management Port Groups are created during the install:


The management VM itself has this footprint (=Default - we want to tweak this a bit):
It is using 16.2GB of disk space. but is this all swap? Let's check:

So, the Datastore which contains the Templates is being used to deploy the Clones, fine for 1 appliance but if I deploy the next set I could run out of space. I ended up migrating the templates to my largest Datastore to ensure this wouldn't become an issue later. They are all Thin Provisioned but your Lab could be tight on space. The VM by default gets 8 vCPU and 16GB Ram. One of the later Cloud appliances wants 16 vCPU!! So watch out. I'm trying to fit this onto a small lab system!

Network Design:

So...the network design required to make your Lab work - I used the same networks as before in 8.1 on my cisco switch:

vLAN IDSubnetGatewayPurposeRange used
1192.168.10.0/24192.168.10.254vLAN_Cloud_DC_Mgmt51-69
50192.168.11.0/24N/A (see below)vLAN50_Cloud_Mgmt
51192.168.12.0/24192.168.12.254vLAN51_Cloud_CAN10 to 20
52192.168.1.0/24192.168.1.1vLAN_Cloud_External
55192.168.13.0/24192.168.13.254vLAN4095_Cloud_Data_Trunk
56192.168.14.0/24192.168.14.254vLAN4095_Cloud_Data_Trunk
57192.168.15.0/24192.168.15.254vLAN4095_Cloud_Data_Trunk
58192.168.16.0/24192.168.16.254vLAN4095_Cloud_Data_Trunk
59192.168.17.0/24192.168.17.254vLAN4095_Cloud_Data_Trunk
60192.168.18.0/24N/A iSCSI Block Storage Network
61192.168.19.0/24N/A Object Proxy Network
62192.168.20.0/24N/A PXE Network
63tbctbcVxLAN underlay

There are a few new ones in relation to Block and Object Storage.  All my Lab Hosts, VMs and Management IPs are on 192.168.10.x so I'm deploying the initial management appliance there first and upon creating the appliance the install deploys the Cloud Management and Cloud CAN port groups without vLAN IDs into my Distributed vSwitch where the vmkernel resides. Note my management vLAN 50 is isolated and can't route as it has no gateway, this solved an issue with 8.1 so I'm keeping it the same here. There is a section in the install guide to be used when deploying the Management Appliances onto the same Hosts when vCenter resides, I've not gone down that road and not experienced any issues but take note if you're building a real Lab to do more than play around. You may encounter issues. 

The biggest changes I had to make to my Lab configuration were:
  • Placed my Management Host into a Cluster - it's the only member but keeps CS9.0 happy!
  • Enable HA and DRS but disable HA admission control
  • Migrated my management host vmkernel management port into the DVS (which I hate doing but CS9.0 won't install if it see's you hiding your vmkernel in a standard switch for simplicity). 
That's it for now. If you need to restart the process, power down the management appliance and delete it, also delete the two new port groups shown above, then close your browser and cancel the script running in the command prompt. The deploy.log will be appended to, not overwritten and you can do this as many times as you like before beginning the next step. There no option to create a lab sized deployment so you'll need 3 of every appliance which will stretch everything in terms in
resources.....!

Finally to keep the Appliance size under control, as soon as the appliance is deployed and finished configuring itself in vCenter, it will be powered on. Power it off immediately, edit the # vCPU from 8 to 4 and Power it back on. We'll repeat this procedure later but it keeps the footprint down. This is not supported by HP and obviously won't allow the solution to scale as the design intended. However for a Lab, unless you're super rich or have a Company Lab available, is the only way to get this to deploy correctly that I've found so far. Ideally I'd like a non HA option with 4 appliances total or a config file to edit the virtual hardware so you don't have to intercept the build in vCenter!

Before I forget, and I'll repeat this later, watch out for the Password used in the next deploy step (The First-Time Installer). Make sure it meets the following requirements (Taken from Release Notes):

Specify a password for admin that is eight characters or less and is a combination of uppercase
and lowercase letters and numerals. Symbols and special characters found on the keyboard
are not supported.

I'll cover it in a later post but this set me back two days until I figured it out.....I used a "!" in CS 8.1 without an issue, it kicked my ass here....

Update: Note, if you intend to use Signed CA Certificates, deploy them NOW, during install. It's much tricker to do this afterwards so take the time to get this right up front.