When downtime is required, users are informed by email.
The system "Message-of-the-Day" also posts downtime schedules and other information.
It is possible that, during downtime, a number of batch jobs may have to be removed. A separate email will be sent to each user with a job in this state.
We have two different job scheduling systems in use, the SGI Systems use Platform Computing (IBM) LSF scheduler and the Mercury cluster uses PBS Torque/Maui from Adaptive Computing. All work is handled through these scheduling systems to effectively manage resources.
LSF is a workload management system. LSF uses a group of configurable queues that run each job based on a number of resource requirements of the job and availability of system resources.
A limit of 30 cpu minutes has been placed on all processes start interactively during a session. When an interactive task reaches this limit it will be killed by the system. Jobs run via LSF are not affected by this limit and are controlled by the LSF queue definition.
Useful LSF files on Zeus in the directory: /usr/local/lsf/ there are several adobe acrobat (pdf) files that can be downloaded and printed for your use:
PBS Torque is the open source version of PBS Pro and is a standard in a large number of HPC environments. The Maui resource manager is the open source version of Adaptive’s MOAB product. Together these programs manage scheduling and resource allocation across the entire cluster.