System Notices

Mar 8, 2010 - Snowpatch Upgraded

The Snowpatch cluster has been upgraded to a newer operating system. All software packages that were previously available have been recompiled and upgraded to a newer version (if available). In all likelyhood users must recompile their own programs as well - please send email to support@westgrid.ca, if you need help.

Mar 8, 2010 - Bugaboo /global/scratch available again

Bugaboo has been rebooted and the /global/scratch directory is available again.

Monday, March 8, 2010 - Bugaboo /global/scratch unavailable due to file system problem

There is a problem with the /global/scratch file system on Bugaboo. Until the problem is resolved you cannot access
files in that file system.  Sorry for the inconvenience.

 

Mar 21 6 AM MST - Power outage UofA WestGrid site.

SUNDAY MARCH 21 6AM MST all UofA WestGrid resources will be unavailable due to a power outage in our data center.

Checkers cluster, IBM SMPs (Cortex, Dendrite, Synapse, Adenine, Guanine, Bigfoot), SGI SMPs (Nexus, Arcturus, Australis, Borealis, Helios, Corona) will all  be affected.

Mar 8, 2010 - Snowpatch system upgrade

snowpatch.westgrid.ca will be unavailable on Mar 8, 2010.

Snowpatch will be completely reinstalled: the operating system will be upgraded to Scientific Linux 5.3 and all applications will be recompiled.

All running jobs will be terminated. All applications that users have compiled themselves on snowpatch must most likely be recompiled after snowpatch comes back into production. Please, contact support@westgrid.ca, if you require help with recompiling your software.

We expect that Snowpatch will be available again on Mar 9.

Sunday, Feb. 21: UBC Orcinus - Data Center Cooling Issue

At ~11:30PM the Chemistry Datacenter suffered a chiller issue which was
 not resolved by UBC Plant Operations until ~1:00AM. As a result, many
 compute nodes in the cluster rebooted and many jobs running at that time
 were lost. We are working to restore normal operations as quickly as
 possible. In the meantime, please resubmit your jobs. We are sorry for
 this inconvenience.

Sat. Feb. 20: Bugaboo /global/scratch file system unavailable

The /global/scratch filesystem can currently not be accessed from the head node (bugaboo) of the Bugaboo facility. Any process that attempts to access that filesystem or files in that filesystem will hang. The vendor has been contacted and is looking at the issue. However, the /global/scratch filesystem can be accessed from all the computenodes and bugaboo-fs. Thus, as a workaround, bugaboo-fs can be used to access files. Furthermore, running jobs that access the /global/scratch filesystem will run as usual.

Jan 27 2010, IBM Power 4 SMP machines Guanine and Adenine suffered hardware problems and are currently powered off.

Due to power failure Guanine and Adenine suffered hardware problems.

The UPS feeding Adenine and Guanine failed on Nov 26 2009 and
Guanine and Adenine have been running on utility power.

Guanine and Adenine have been currently powered off on Jan  27.

Jan 19 2:00 - 10:00 AM MST - Australis hardware faliure now back in production

Jan 19 2:00 - 10:00 AM MST - Australis hardware faliure now back in production

Australis experiance a hardware failure.

 Jobs running on Borealis, Helios, Corona which mount filesytem from
 Australis were effected. 

Australis is back in production with 32 processors and 14 GM RAM available.
 

Syndicate content