2007 CHPC Downtimes and History

CHPC DOWNTIME: Thursday January 3, 2008

Posted: December 19, 2007

Event date: January 3, 2008

Duration: Downtime starts at 3pm and will last until sometime early morning on January 4, 2008

Arches Downtime Duration:

Systems affected:

All of Arches and CHPC/INSCC Network

After this downtime all users will be using the campus uNID and password for authentication on all HPC systems (and other Linux systems admined by CHPC). Windows users will use the uNID and current password for authentication.

Arches:

All clusters will be down from 3pm to allow for updates to the OS and for the other changes outlined below. The Batch Queues will be drained of all running jobs. Reservations are in place so that jobs will not be started if they will not finish before the start of the downtime. Jobs that are queued but not running will be started after the downtime ends. The one exception to this is if you are being moved to using your unid for authentication during this downtime (see below); in this case any queued jobs you have will need to be deleted. The clusters will down until sometime the following morning.

**MIGRATION TO NEW FILESERVER: Some CHPC Users, those on the CHPC owned home directory filesystems i.e., those with home directories /uufs/inscc.utah.edu/common/home/USERID - will be migrated to a new, larger fileserver during this downtime. If you are one of these users your new home directory path will be /uufs/chpc.utah.edu/common/home/UNID

**CHANGE TO UNID: All CHPC users that are not already using their UNID as the CHPC login will be changed to doing so. If you do not have a UNID you will need to get one BEFORE this downtime. All University of Utah students and employees automatically have a UNID. But if you are a not a part of the University of Utah, you need to fill out a Person of Interest (PoI) form to get assigned a UNID. This form can be found at http://www.hr.utah.edu/forms/lib/u-affiliate-poi-form.pdf.

Network Outage:

All networking in CHPC/INSCC will be down from about 5-7pm

Systems affected:

All of Arches and CHPC/INSCC Network

After this downtime all users will be using the campus uNID and password for authentication on all HPC systems (and other Linux systems admined by CHPC). Windows users will use the uNID and current password for authentication.

Arches:

All clusters will be down from 3pm to allow for updates to the OS and for the other changes outlined below. The Batch Queues will be drained of all running jobs. Reservations are in place so that jobs will not be started if they will not finish before the start of the downtime. Jobs that are queued but not running will be started after the downtime ends. The one exception to this is if you are being moved to using your unid for authentication during this downtime (see below); in this case any queued jobs you have will need to be deleted. The clusters will down until sometime the following morning.

**MIGRATION TO NEW FILESERVER: Some CHPC Users, those on the CHPC owned home directory filesystems i.e., those with home directories /uufs/inscc.utah.edu/common/home/USERID - will be migrated to a new, larger fileserver during this downtime. If you are one of these users your new home directory path will be /uufs/chpc.utah.edu/common/home/UNID

**CHANGE TO UNID: All CHPC users that are not already using their UNID as the CHPC login will be changed to doing so. If you do not have a UNID you will need to get one BEFORE this downtime. All University of Utah students and employees automatically have a UNID. But if you are a not a part of the University of Utah, you need to fill out a Person of Interest (PoI) form to get assigned a UNID. This form can be found at http://www.hr.utah.edu/forms/lib/u-affiliate-poi-form.pdf.

Network Outage:

All networking in CHPC/INSCC will be down from about 5-7pm


CHPC DOWNTIME: Thursday January 3, 2008

Posted: November 29, 2007

Duration: Times and Scope to be determined

Arches Downtime Duration:

Systems affected:

After this downtime all users will be using the campus uNID and password for authentication on all HPC systems (and other Linux systems admined by CHPC). Windows users will use the uNID and current password for authentication.

Arches:

To Be Determined. (Queues will be drained at a minimum.)

Network Outage:

To Be Determined

Systems affected:

After this downtime all users will be using the campus uNID and password for authentication on all HPC systems (and other Linux systems admined by CHPC). Windows users will use the uNID and current password for authentication.

Arches:

To Be Determined. (Queues will be drained at a minimum.)

Network Outage:

To Be Determined


CHPC DOWNTIME: September 13, 2007

Posted: September 4, 2007

Systems Affected/Downtime Timelines: *****/scratch/serial-old will be flushed and rebuilt********** Network outage, including wireless, from 5:30 p. m. until approx 7:00 p. m.

Arches Downtime Duration:

Systems affected: All of arches, chpc network

Arches: CHPC will be taking down the sanddunearch cluster on Thursday September 13th at 11a.m. to continue the maintenance on the Infiniband fabric. The remainder of arches will be taken down on at 5 p.m. Jobs idle the queue will remain in the queue, but a system reservation is in place so that jobs which cannot complete before the beginning of the downtime will not start.

Network Outage: This portion will affect all of arches as well as desktop access to filesystems and to rest of world. Wireless will also be affected. Connectivity will be intermittent during this time period.

This downtime is scheduled to perform maintenance on the KOMAS cooling system. OS updates will be applied to arches. /scratch/serial-old will be rebuilt to increase the file space available. All data on the /scratch/serial-old will be deleted. Please remove any necessary data from /scratch/serial-old before the start of the downtime.

Instructions to User: *****/scratch/serial-old will be flushed and rebuilt********** Network outage, including wireless, from 5:30 p. m. until approx 7:00 p. m.

Systems affected: All of arches, chpc network

Arches: CHPC will be taking down the sanddunearch cluster on Thursday September 13th at 11a.m. to continue the maintenance on the Infiniband fabric. The remainder of arches will be taken down on at 5 p.m. Jobs idle the queue will remain in the queue, but a system reservation is in place so that jobs which cannot complete before the beginning of the downtime will not start.

Network Outage: This portion will affect all of arches as well as desktop access to filesystems and to rest of world. Wireless will also be affected. Connectivity will be intermittent during this time period.

This downtime is scheduled to perform maintenance on the KOMAS cooling system. OS updates will be applied to arches. /scratch/serial-old will be rebuilt to increase the file space available. All data on the /scratch/serial-old will be deleted. Please remove any necessary data from /scratch/serial-old before the start of the downtime.


CHPC Downtime: Arches and Network 16th, 2007 beginning at 5 p.m.

Posted: August 14, 2007

Arches Downtime Duration:

CHPC Downtime: Arches and Network 16th, 2007 beginning at 5 p.m.

Systems affected:

All arches clusters will go down at 5 p.m. to replace critical hardware which is failing. The duration of the arches downtime is unknown. We apologize for the short notice. Any running jobs will be killed but jobs waiting in the queue should ride out this downtime.

Connectivity to the INSCC building, including wireless, will be intermittant from 5-7 p.m.

Duration:Thursday August 16th:

  • Network from 5:00 p.m. - approximately 7:00 p.m.
  • Arches from 5:00 p.m. - until repairs are completed

CHPC Downtime: Arches and Network 16th, 2007 beginning at 5 p.m.

Systems affected:

All arches clusters will go down at 5 p.m. to replace critical hardware which is failing. The duration of the arches downtime is unknown. We apologize for the short notice. Any running jobs will be killed but jobs waiting in the queue should ride out this downtime.

Connectivity to the INSCC building, including wireless, will be intermittant from 5-7 p.m.

Duration:Thursday August 16th:

  • Network from 5:00 p.m. - approximately 7:00 p.m.
  • Arches from 5:00 p.m. - until repairs are completed

CHPC Downtime: Sand Dune Arch down June 4th, 2007 from 9 a.m. until 9 p.m.

Posted: June 4, 2007

Arches Downtime Duration:

CHPC Downtime: Sand Dune Arch down June 6th, 2007 from 9 a.m. until 9 p.m.

Systems affected:

CHPC will be taking down the sanddunearch cluster on Monday morning, June 4th at 9:00 a.m. to perform maintenance on the Infiniband fabric. The downtime is expected to last until about 9:00 p.m. Jobs in the queue will remain in the queue, but jobs which cannot complete before the beginning of the downtime will not start.

Duration:Monday June 6th from 9:00 a.m. - approximately 9:00 p.m.

Scope: This downtime will only affect sanddunearch.

CHPC Downtime: Sand Dune Arch down June 6th, 2007 from 9 a.m. until 9 p.m.

Systems affected:

CHPC will be taking down the sanddunearch cluster on Monday morning, June 4th at 9:00 a.m. to perform maintenance on the Infiniband fabric. The downtime is expected to last until about 9:00 p.m. Jobs in the queue will remain in the queue, but jobs which cannot complete before the beginning of the downtime will not start.

Duration:Monday June 6th from 9:00 a.m. - approximately 9:00 p.m.

Scope: This downtime will only affect sanddunearch.


Major CHPC Downtime: Thursday May 3, 2007, 5:00 p.m. - about Midnight

Posted: April 28, 2007

Arches Downtime Duration:

Major CHPC Downtime: Thursday May 3, 2007, 5:00 p.m. - about Midnight

Systems affected: All arches clusters down for maintenance for the entire downtime. Network connectivity to all servers will be unavailable from about 5pm to 7pm. After 7pm home directory space, except for the IGrid space, should be available if you have that space mounted on your desktop.

After this downtime the /scratch/serial-old, /scratch/da, and /scratch/mm systems will be mounted as READ ONLY and only on the interactive nodes. They will remain available in this manner for a few weeks. During this time users are encouraged to move data off of these systems such as /scratch/serial or /scratch/parallel.

All arches batch queues will be drained of running jobs leading up to the start of the downtime. Jobs waiting to be run will be kept and started after the downtime; however, if these waiting jobs write to /scratch/serial-old, /scratch/mm, or /scratch/da they will die when started due to the change of these scratch file systems back to READ ONLY.

Duration:Thursday May 3rd from 5:00 p.m. - approximately Midnight

Scope: CHPC will be doing system maintenance on the arches clusters and on the CHPC networks.

Please note that /scratch is not backed up and is not intended for the storage of important or permanent data there. CHPC reserves the right to cleanup old files from all scratch space. The user is responsible to move important data to a permanent location such as a home department file server.

Major CHPC Downtime: Thursday May 3, 2007, 5:00 p.m. - about Midnight

Systems affected: All arches clusters down for maintenance for the entire downtime. Network connectivity to all servers will be unavailable from about 5pm to 7pm. After 7pm home directory space, except for the IGrid space, should be available if you have that space mounted on your desktop.

After this downtime the /scratch/serial-old, /scratch/da, and /scratch/mm systems will be mounted as READ ONLY and only on the interactive nodes. They will remain available in this manner for a few weeks. During this time users are encouraged to move data off of these systems such as /scratch/serial or /scratch/parallel.

All arches batch queues will be drained of running jobs leading up to the start of the downtime. Jobs waiting to be run will be kept and started after the downtime; however, if these waiting jobs write to /scratch/serial-old, /scratch/mm, or /scratch/da they will die when started due to the change of these scratch file systems back to READ ONLY.

Duration:Thursday May 3rd from 5:00 p.m. - approximately Midnight

Scope: CHPC will be doing system maintenance on the arches clusters and on the CHPC networks.

Please note that /scratch is not backed up and is not intended for the storage of important or permanent data there. CHPC reserves the right to cleanup old files from all scratch space. The user is responsible to move important data to a permanent location such as a home department file server.


/scratch/serial is currently down. Please use /scratch/parallel, /scratch/serial-old, /scratch/mm or /scratch/da.

Posted: April 5, 2007

Arches Downtime Duration:

/scratch/serial is currently down. Please use /scratch/parallel, /scratch/serial-old, /scratch/mm or /scratch/da.

The new /scratch/serial filesystem is down for troubleshooting by the vendor. We expect it to be available on Monday April 9th. Please use the PVFS space (/scratch/parallel). We have also temporarily mounted the "old /scratch/serial" at /scratch/serial-old temporarily on all of the clusters. We have also mounted /scratch/mm temporarily on marchingmen and /scratch/da temporarily on delicatearch. When we bring up the new /scratch/serial space we will be returning the old /scratch spaces to read-only. Please let us know if you have questions.

/scratch/serial is currently down. Please use /scratch/parallel, /scratch/serial-old, /scratch/mm or /scratch/da.

The new /scratch/serial filesystem is down for troubleshooting by the vendor. We expect it to be available on Monday April 9th. Please use the PVFS space (/scratch/parallel). We have also temporarily mounted the "old /scratch/serial" at /scratch/serial-old temporarily on all of the clusters. We have also mounted /scratch/mm temporarily on marchingmen and /scratch/da temporarily on delicatearch. When we bring up the new /scratch/serial space we will be returning the old /scratch spaces to read-only. Please let us know if you have questions.


Major CHPC Downtime: Thursday March 29, 2007, 4:00 p.m. - Midnight

Posted: March 14, 2007

Arches Downtime Duration:

Major CHPC Downtime: Thursday March 29, 2007, 4:00 p.m. - Midnight

Systems affected: All arches clusters down for maintenance. Network connectivity to all servers and from INSCC to campus will be intermittent from 5 -7. Access to home directories will be intermittent throughout the downtime. All jobs will be drained and queued jobs will be flushed.

Duration:Thursday March 29th from 4:00 p.m. - Midnight

Scope: CHPC will be doing system maintenance on the arches clusters, and on the CHPC networks. The /scratch filesystems will be arranged as follows:

  1. /scratch/serial-beta will be moved to /scratch/serial
  2. The "old" (current) /scratch/serial will be moved to /scratch/serial-old and will only be mounted READ ONLY on the interactive nodes and available at that path for a few weeks.
  3. /scratch/da, /scratch/mm and /scratch/serial-pio will also be mounted READ ONLY on the interactive nodes for a few weeks.
  4. Please note that data in /scratch are never backed up and you should not store important or permanent data there. CHPC reserves the right to cleanup old files from this space. Please move important data to a permanent location such as a home department file server. We plan to take the space recovered from the "old" /scratch/serial, /scratch/da, /scratch/mm and /scratch/serial-pio and add it to PVFS (/scratch/parallel) .

    Major CHPC Downtime: Thursday March 29, 2007, 4:00 p.m. - Midnight

    Systems affected: All arches clusters down for maintenance. Network connectivity to all servers and from INSCC to campus will be intermittent from 5 -7. Access to home directories will be intermittent throughout the downtime. All jobs will be drained and queued jobs will be flushed.

    Duration:Thursday March 29th from 4:00 p.m. - Midnight

    Scope: CHPC will be doing system maintenance on the arches clusters, and on the CHPC networks. The /scratch filesystems will be arranged as follows:

    1. /scratch/serial-beta will be moved to /scratch/serial
    2. The "old" (current) /scratch/serial will be moved to /scratch/serial-old and will only be mounted READ ONLY on the interactive nodes and available at that path for a few weeks.
    3. /scratch/da, /scratch/mm and /scratch/serial-pio will also be mounted READ ONLY on the interactive nodes for a few weeks.
    4. Please note that data in /scratch are never backed up and you should not store important or permanent data there. CHPC reserves the right to cleanup old files from this space. Please move important data to a permanent location such as a home department file server. We plan to take the space recovered from the "old" /scratch/serial, /scratch/da, /scratch/mm and /scratch/serial-pio and add it to PVFS (/scratch/parallel) .


Major CHPC Network Downtime: Thursday February 15th, 2007 5-8 p.m.

Posted: January 30, 2007

Arches Downtime Duration:

Major CHPC Network Downtime: Thursday February 15th, 2007 5-8 p.m.

Systems affected: All CHPC Network connectivity. There will be a reservation on ALL Arches Clusters preventing jobs from running during this timeframe.

Duration: Thursday February 15, 2007 from 5-8 p.m.

Scope: CHPC will updating to a new router. This should fix some of the networking problems we've had the past several weeks.

Major CHPC Network Downtime: Thursday February 15th, 2007 5-8 p.m.

Systems affected: All CHPC Network connectivity. There will be a reservation on ALL Arches Clusters preventing jobs from running during this timeframe.

Duration: Thursday February 15, 2007 from 5-8 p.m.

Scope: CHPC will updating to a new router. This should fix some of the networking problems we've had the past several weeks.


Major CHPC Downtime: Thursday January 25th, 2007 5 p.m. - duration to be determined

Posted: January 16, 2007

Arches Downtime Duration:

Major CHPC Downtime: Thursday January 25th, 2007 5 p.m. - duration to be determined

Systems affected: All arches clusters down for maintenance. Network connectivity to all servers in INSCC will be off-line for part of the downtime.

Duration: Thursday January 25th, 2007 from 5:00 p.m. - duration not yet determined but will be posted when we have an estimate.

Scope: CHPC will be doing system maintenance on the arches clusters. PVFS will have maintenance performed requiring all files in /scratch/parallel to be purged. PLEASE migrate important data to another file system such as a home department file server prior to this downtime!!! Maintenance will also be performed on several of the CHPC switches.

More information will be posted as details are confirmed.

Major CHPC Downtime: Thursday January 25th, 2007 5 p.m. - duration to be determined

Systems affected: All arches clusters down for maintenance. Network connectivity to all servers in INSCC will be off-line for part of the downtime.

Duration: Thursday January 25th, 2007 from 5:00 p.m. - duration not yet determined but will be posted when we have an estimate.

Scope: CHPC will be doing system maintenance on the arches clusters. PVFS will have maintenance performed requiring all files in /scratch/parallel to be purged. PLEASE migrate important data to another file system such as a home department file server prior to this downtime!!! Maintenance will also be performed on several of the CHPC switches.

More information will be posted as details are confirmed.