System Notices
Mar 8, 2010 - Snowpatch Upgraded
Last Updated March 9th, 2010The Snowpatch cluster has been upgraded to a newer operating system. All software packages that were previously available have been recompiled and upgraded to a newer version (if available). In all likelyhood users must recompile their own programs as well - please send email to support@westgrid.ca, if you need help.
Mar 8, 2010 - Bugaboo /global/scratch available again
Last Updated March 9th, 2010Bugaboo has been rebooted and the /global/scratch directory is available again.
Monday, March 8, 2010 - Bugaboo /global/scratch unavailable due to file system problem
Last Updated March 8th, 2010There is a problem with the /global/scratch file system on Bugaboo. Until the problem is resolved you cannot access
files in that file system. Sorry for the inconvenience.
Mar 21 6 AM MST - Power outage UofA WestGrid site.
Last Updated February 26th, 2010SUNDAY MARCH 21 6AM MST all UofA WestGrid resources will be unavailable due to a power outage in our data center.
Checkers cluster, IBM SMPs (Cortex, Dendrite, Synapse, Adenine, Guanine, Bigfoot), SGI SMPs (Nexus, Arcturus, Australis, Borealis, Helios, Corona) will all be affected.
Mar 8, 2010 - Snowpatch system upgrade
Last Updated February 26th, 2010snowpatch.westgrid.ca will be unavailable on Mar 8, 2010.
Snowpatch will be completely reinstalled: the operating system will be upgraded to Scientific Linux 5.3 and all applications will be recompiled.
All running jobs will be terminated. All applications that users have compiled themselves on snowpatch must most likely be recompiled after snowpatch comes back into production. Please, contact support@westgrid.ca, if you require help with recompiling your software.
We expect that Snowpatch will be available again on Mar 9.
Sunday, Feb. 21: UBC Orcinus - Data Center Cooling Issue
Last Updated February 21st, 2010At ~11:30PM the Chemistry Datacenter suffered a chiller issue which was
not resolved by UBC Plant Operations until ~1:00AM. As a result, many
compute nodes in the cluster rebooted and many jobs running at that time
were lost. We are working to restore normal operations as quickly as
possible. In the meantime, please resubmit your jobs. We are sorry for
this inconvenience.
Sat. Feb. 20: Bugaboo /global/scratch file system unavailable
Last Updated February 21st, 2010The /global/scratch filesystem can currently not be accessed from the head node (bugaboo) of the Bugaboo facility. Any process that attempts to access that filesystem or files in that filesystem will hang. The vendor has been contacted and is looking at the issue. However, the /global/scratch filesystem can be accessed from all the computenodes and bugaboo-fs. Thus, as a workaround, bugaboo-fs can be used to access files. Furthermore, running jobs that access the /global/scratch filesystem will run as usual.
February 12 - UofA IBM P4 SMP machines Adenine & Guanine are back in production
Last Updated February 13th, 2010Jan 27 2010, IBM Power 4 SMP machines Guanine and Adenine suffered hardware problems and are currently powered off.
Last Updated January 27th, 2010Due to power failure Guanine and Adenine suffered hardware problems.
The UPS feeding Adenine and Guanine failed on Nov 26 2009 and
Guanine and Adenine have been running on utility power.
Guanine and Adenine have been currently powered off on Jan 27.
Jan 19 2:00 - 10:00 AM MST - Australis hardware faliure now back in production
Last Updated January 19th, 2010Jan 19 2:00 - 10:00 AM MST - Australis hardware faliure now back in production
Australis experiance a hardware failure.
Jobs running on Borealis, Helios, Corona which mount filesytem from
Australis were effected.
Australis is back in production with 32 processors and 14 GM RAM available.
