To prepare for the eventual decommissioning of the current SCELSE compute cluster, we are implementing a new cluster that will be provisioned as additional compute nodes and eventually take over the cluster operations when the current cluster is decommissioned, including reusing the current storage if necessary.
The new nodes will be controlled by the current head node and share the current storage.
Design and Requirements
The cluster design is as follows:
New Cluster Nodes (Compute Nodes 1-1 to 1-4/Future Head Node)
- 2 x Intel Xeon 2.2 GHz
- Each Intel Xeon to have 22 cores
- 512 GB of RAM
- 2 x 600GB harddisk on RAID 1
- 2 x 10Gb network ports
- 1 x 1Gb network port
- Necessary transceivers and/or cables
Setup of new cluster private network
- 2 x 24 ports 10Gb network switch for redundancy
- Necessary cables and/or transceivers
- One of the new compute nodes must be able to transit easily to a head node for the new cluster when current cluster is decommissioned (process must be documented).
- New cluster must be able to easily connect to a new storage solution when the current storage is decommissioned (process must be documented).
- All equipment to be housed in a 42U rack
- Rack-mountable Keyboard Video Mouse (KVM) switch and cables
- Rack-mountable 18.5″ Keyboard Monitor Mouse (KMM) and cables
- Rack must include 2 x Power Distribution Unit (PDU) with 32A Cee-form connector
- 10G network card, transceivers and/or cables for existing Dell R710 storage node to connect to new cluster private network
- Users are to be able to transition between both clusters seamlessly without a need for multiple user ids or passwords.
- Vendor to supply cluster management software for new cluster
- Vendor to supply, install, test and commission the new cluster
- Vendor to ensure old cluster and new cluster are both in working order once completed
- Vendor to configure the network to ensure both clusters work and are not in conflict
- Vendor to install the head and compute nodes with CentOS
In addition, all professional services, hardware and/or software licensing must be provided on top of a 24x7x4 onsite service warranty for not less than 3 years from the date of commissioning.
Tender was called 10th of Feb 2017 and a mandatory site briefing conducted for interested parties on the 15th of Feb 2017. Tender closed on the 28th of Feb 2017 with 3 submitted proposals. Tender was awarded to Cxrus Solutions Pte Ltd in April 2017.
- After discussions with the appointed vendor, the design spec will be changed to have 2 clusters running simultaneously accessing the same shared storage. This implies that one of the compute nodes will transition immediately as a head node, unless a new head node can be purchased.
- Work will start on the 15th of May 2017, expected completion will be 15th July 2017.
Project started 19th September 2016.
- Cluster is up and running
- UAT completed 14th Sept 2017
- Project completed 16th Nov 2017