To address large compute jobs with large datasets, the National Super Computing Center (NSCC) has proposed a testbed of connections from NSCC to various departments within NUS.
The network topology is as follows:
The current test bed proposal is as follows (Angie Lim, email summary dated 11/03/2016):
- landing point for the proposed interim connection (from NSCC) would be at MD6, Basement 1.
- there are spare fiber optic connections between NUH and MD6 which can be used to extend the interim connection from MD6 to the NUH Main Building data centre at Level 6.
- a separate equipment rack is required to be setup to house new network equipment
- line will need to be pulled from CeLS to MD6 B1, and necessary switches have to be purchased and installed
- all parties sharing the NSCC link would jointly fund the common switches/fiber optic lines to support the proposed interim connection from NSCC.
- the test bed will run for 3 months and will not be using infiniband as the telcos are using XGPON
On the security side, we have been informed that “dual-home” connectivity is not allowed to avoid violation of NUS AUP and as such NSCC and NUS networks must be kept separate at all times.
Long Term Implementation
In the long term, the link will be an infiniband link and services from NSCC will be available via a portal which will bridge NUS logins and NSCC services. The underlying infrastructure for the long term solution will be 40 – 100Gbps switches at the various RI/RCs (in our case CeLS) to designated tap points which will all link back to com cen and their central switch connecting via infiniband to NSCC. Fiber laid for the test bed can be reused for the long term implementation.
Current plan as agreed with com cen is (Tan Chee Chiang, email dated 15/04/2016):
- 40G switches will be used for the interconnect for the time being as 100G connection to NSCC is not ready (link is considered ‘long range’ and there are no 100G long range infiniband switches as yet)
- at such time when the 100G connection to NSCC is ready, NUS will cover the cost of the upgrade where central funding will cover the cost down to the network ports for our equipment to connect
- point of upgrade will not be dependent on all users requesting for an upgrade but on the fact that the 100G connection to NSCC is ready – NUS/NSCC will upgrade the switch to 100G automatically when the 100G link between NUS and NSCC becomes available, regardless of whether all users request for it or not (Tan Chee Chiang noted points raised by LSI director, email dated 25/04/2016)
Project started 3rd Feb 2016.
- issues running the fiber link between CeLS and MD6
- OED has no current drawings to determine the actual free conduits running between the buildings
- only a very rough costing for the cable run is available at S$300,000
- OED contractors who are aware of how the conduits are run are unwilling to give any information on said conduits as they want to be in the best position to win the bid if a tender is called to run the fiber
- meeting between LSI Director, OED and com cen reps will take place to iron this matter out if possible failing which, Director LSI will inform the relevant NUS higher management that it is not possible for us to be part of this test bed (which is highly beneficial to the research community in NUS) due to the administration expecting the RIs/RCs to pay for what will eventually be campus wide infrastructure
- on a parallel track, NSCC will be made aware of the situation and see what they can contribute to get the fiber installed
- (Updated 18th Oct 2016) agreed that NSCC will go ahead with deployment of the testbed with the other participants (CSI, NUHS, etc) first while LSI works out the cabling issues (email with NSCC and Lawrence Wong dated 18/10/2016).
- (Updated 25th Oct 2016) OFM/OED have plotted an exposed route for cabling from CeLS to MD6. There is also an alternative route mentioned by one of the vendors – site survey will be done on the 31st Oct 2016 for these two routes.
- (Updated 25th Oct 2016) OFM have confirmed the cabling path to be as follows (email from OFM Neo Heng Hau dated 25th Oct 2016):
- CeLS building basement
- overhead chilled water pipe mounting between CeLS and MD2
- above MD2 side staircase leading to MD3
- Multi-Cable Transit (MCT) pipes leading to MD6 chilled plant room
- MD6 basement carpark
- MD6 server room MD6-B1-WIRECEN
- (Updated 31st Oct 2016) Potential networking vendors have confirmed the proposed cabling path by OFM, pending detail inspection of the access from the MCT into the MD6 chilled plant room. Expected length of work is approximately 3 months, cost to be updated.
- (Updated 6th Jan 2017) Tender for cabling installation has been awarde.
- (Updated 16th Jan 2017) M1 uplink activated at MD6, test bed to run for 3 months until mid April 2017 and any extension will at S$12,000 a month, share among the various departments using the line.
- (Updated 18th Jan 2017) Cabling installation scheduled to start week of 20th Feb 2017.
- (Updated 13th Feb 2017) Cabling installation re-scheduled to start 3rd Mar 2017.
- Setup of scaffolding and scissor lift brought in 3rd Mar 2017.
- Cabling completed 6th Mar 2017.
- If test bed period is missed and there is no extension due to monthly costs, cabling will still benefit inter building communications between CIRC and CMIF.
- Testing of compute resources showed significant speed up in processing, approximately half the time to complete the job at NSCC than on the 2000 core cluster in NTU (email from Krithika dated 5th May 2017, 2.38pm).
- Data transfer tests over SFTP and FTP to and from NUS and NSCC showed very poor speeds given the 2.5Gbps up, 10Gbps down link, averaging out at around 120MB/s to NSCC and 300MB/s from NSCC (email dated 4th May 2017 at 1.10pm from Mark De Silva to Peter Little and Rohan).
- The test link has been extended for a further 3 months according to NSCC (verbal confirmation).
- According to Gong Wei and Prof Lawrence Wong, a new fiber link to a datanode connected to NSCC via a 100Gbps link and the distribution point for Science and Medicine faculties will be at the CeLS building, located in one of the 4 wirecenters (meeting at T-Labs, 24th April 2017). This link will run separate from the initial XGPON link currently being tested (verbal discussion at NSCC on 8th May 2017 during SICS meeting with Tin Wee and Guan Sin).
- NUS-NSCC IFT (implementation task force) met with LSI reps at a meeting 19th Sept 2017 to inform us of the implementation plan, costs and timeline.
- Test link terminated in October 2017 (LSI was not informed).
- NSCC loan switch for the test link collected by NSCC on the 18th January 2018
- New 100Gbps link to distribution switch (wall rack outside CeLS DC) January 2018.
- Management switch installed in CeLS DC rack by On Demand Systems 3rd May 2018.
- Fiber patch cables from distribution switch to CeLS DC installed 15th May 2018.
- Patching cables to the management switch done 7th June 2018.
- Link is ready for deployment – LSI to purchase login node to specifications given by NUS-NSCC ITF during 19th Sept 2017 meeting.
Estimated Completion Date