2012 CHPC News Announcements

Unexpected Power Outage in Komas Data Center - November 1st, 2012 6:30 p.m. to November 2nd, 5:15 p.m.

Posted: November 1, 2012

***UPDATE*** As of 5:15 p.m. November 2nd, all clusters are backup up and scheduling work. We appreciate your patience as we worked through this failure.

***UPDATE*** 10:00 a.m. November 2nd Expect the clusters to remain down through the day today. Failed equipment is still undergoing repairs. Once repairs are complete there will be additional time validating infrastructure and bring networking and clusters back online. As we learn more we will keep you posted. We apologize for the inconvenience.

***UPDATE*** 9:00 p.m. November 1st CHPC systems ascertained that the power will not be restored before mid-morning Friday November 2nd. Another update will be posted with better estimates as they are known. We are hopeful to have power back sometime tomorrow.

Around 6:30 p.m. on November 1st, power was unexpectedly dropped in the CHPC Komas data center, down our main HPC clusters including ember, updraft and sanddunearch.

CHPC staff are onsite and working with Campus and Rocky Mountain Power to find out the cause and to effect repairs. We will keep you updated on the progress as we learn more.


CHPC Major Downtime: Tuesday October 9th, 2012 beginning at 7:00 AM - Unknown

Posted: October 3, 2012

Event date: October 9, 2012

Duration: From 7 a.m. October 9th - Clusters will be down most of the day

Systems Affected/Downtime Timelines: During this downtime, maintenance will be performed in the datacenters, requiring the clusters to be down most of the day, starting at 7am. The network, virtual machines, and the home directory fileservers, with the exception of two groups that were already notified via email, will NOT be affected.


Unexpected Power Outage in Komas Data Center - September 27th approximately 10:00 a.m.

Posted: September 27, 2012

Duration: Unknown Duration

Systems Affected/Downtime Timelines:
* Ember Cluster ~10:00-unknown
* Updraft Cluster ~10:00- ~ 4:20 p.m.
* Sanddunearch Cluster ~10:00 a.m. - ~4:20 p.m.

Around 10 a.m. on September 27th, a breaker tripped in the CHPC Komas data center, taking down half of the power to the room, affecting our main HPC clusters including ember, updraft and sanddunearch. There was maintenance being performed on the UPS at the time, which was not expected to cause any problems.

Sanddunearch and Updraft are back up (about 4:15 p.m.) but Ember is still having some issues. CHPC staff are troubleshooting as of 5:25 p.m.


CHPC recommends users update login scripts from time to time

Posted: September 19, 2012

CHPC provides users with standard login scripts for both bash and tsch shells. These files, named .tcshrc and .bashrc, are put in your home directory when your account is created. From time to time you should consider updating these files to the most current versions.

The easiest way to do so is to use the wget command, making sure you are in your home directory. The example below is for .tcshrc, but by changing all instances of tcsh with bash you will update the .bashrc file.

Issue the following 3 commands:

1. mv .tcshrc .tcshrc-save
2. wget http://www.chpc.utah.edu/docs/manuals/getting_started/code/chpc.tcshrc
3. mv chpc.tcshrc .tcshrc

CHPC also recommends that you do not customize this file, aside from commenting in setups for different packages that you need, or uncommenting them out if you do not need a specific package. For more details on this, see our featured article.

If you need any help in setting up your environment, please send a note to issues@chpc.utah.edu and we will be happy to assist you.


OpenACC GPU Programming Workshop

Posted: September 12, 2012

Duration: Tuesday October 16 and Wednesday October 17

CHPC will be one of the ten national satellite sites to present an OpenACC GPU programming workshop, presented by the Pittsburgh Supercomputing Center, the National Institute for Computational Sciences and Georgia Tech.

OpenACC is the accepted standard using compiler directives to allow quick development of GPU capable codes using standard languages and compilers. It has been used with great success to accelerate real applications within very short development periods. This workshop assumes knowledge of either C or Fortran programming. It will have a hands-on component using the NSF large scale GPU computing platform presently being deployed at the National Institute for Computational Sciences.

For details, agenda and registration, please, see
http://www.psc.edu/index.php/training/openacc-gpu-programming.

Note that the agenda times are in EDT, subtract two hours to obtain our local MDT.

For any local questions please contact CHPC help desk via issues at chpc.utah.edu.


CHPCFS Filesystem Update

Posted: August 14, 2012

CHPC systems staff have identified the issue with chpcfs that lead to the problems earlier today and believe that the problem has been resolved and performance has been restored. We thank you for your patience.


CHPCFS Filesystem Issues

Posted: August 14, 2012

The CHPCFS Filesystem starting experiencing problems about 2 hours ago (around 2pm). Systems is working on resolving the issues and we will provide more information when we know more.


CHPCFS Filesystem Issues

Posted: August 14, 2012

The CHPCFS Filesystem starting experiencing problems about 2 hours ago. Systems is working on resolving the issues and we will provide more information when we know more.


Short GPU node outage on ember cluster (em513-em524) Monday, August 13, from 3 - 4 p.m.

Posted: August 6, 2012

Duration: Monday, August 13th, 2012 from 3 p.m. until 4 p.m.

Systems Affected/Downtime Timelines: GPU nodes on ember cluster: em513 through em524

We will be updating the cuda version to 4.2 on all 12 GPU nodes.

Updates completed about 4 p.m. - users of the GPU nodes should consider recompiling for 4.2.


Proven Algorithmic Techniques for Many-core Processors Summer School next week

Posted: August 6, 2012

Duration: 8/13/2012 9am to 8/17/2012 4pm

We would like to remind the users that CHPC will host the Proven Algorithmic Techniques for Many-core Processors Summer School whole next week. This is a unique opportunity for students and researchers to learn to program efficiently on current CPUs and GPUs. The course is taught virtually by world class experts and includes a keynote lecture by one of main developers of VMD and NAMD molecular simulations programs. Friday afternoon will also be devoted to a special topic - computational fluid dynamics.

More information is at http://www.vscse.org/summerschool/2012/manycore.html

Detailed class schedule is at
http://home.chpc.utah.edu/~mcuma/vscse/VSCSESchedule-Manycore2012.pdf

To register, go to https://hub.vscse.org/. Registration is free.
The course will take place in the INSCC Auditorium, Rm. 110. The participants will be notified individually a few days before the course starts. Please, plan to show up before 9am on Monday and expect the course to take the whole week of your time if you want to do it thoroughly.


Issues with Ember Batch Queue

Posted: July 20, 2012

Duration: Today - July 20, starting at approximately 2:30pm

At about 2:30pm today there was a problem with the batch queue on ember that resulted in users not being able to submit jobs. The problem was traced to corruption in the batch database. The recovery of the batch system required that the queue be flushed, therefore at about 4pm ALL jobs, both running and idle, in the queue on ember were lost. Users will need to resubmit all jobs. Reservations were NOT lost. The scheduling of jobs was restored at approximately 5:15pm.

CHPC apologizes to all of the users that were impacted by this problem.


Job scheduling on the clusters is paused

Posted: July 5, 2012

Duration: several hours

The scheduler on the clusters was paused at about 1pm to an issue with our accounting system. Running jobs are not affected by this, however no new jobs can start. We are anticipating that the problem will be resolved in the next few hours; when it is we will restart the scheduler and jobs waiting in the queue will be able to start.


EMERGENCY Power Outage affecting CHPC File Servers: starting immediately until repairs are effected

Posted: June 25, 2012

Systems Affected/Downtime Timelines:

**All CHPC File Services will be down until repairs are completed.

The SSB Machine room requires an emergency power outage to effect repairs to the flywheel generator. CHPC is taking down all file systems offline during this power outage to protect data.

The HPC clusters scheduling will be paused, but will remain up.

The CHPC VM farms will also be taken offline as they are also located in the SSB machine room.

Instructions to User:

If you mount CHPC file systems on your linux desktop, we recommend you logoff. Windows and Mac desktops should be able to function, but you will not have access to CHPC file systems.

Once we have notified you that services are in place, please reboot your desktop.


Update of chpc standard .tcshrc and .bashrc

Posted: June 21, 2012

All,

There were a number of issues today that were in part due to the use of older .tcshrc files which sourced

/uufs/chpc.utah.edu/sys/pkg/intel/ifort/std/bin/ifortvars.csh

as this file did not exist (it has since been restored).

However, while troubleshooting the issue reports, we realized that this is setting up the environment for a version of the intel compilers that is approximately two years old. Since this version there have been several updates, the latest of which can be accessed with the following

source /uufs/chpc.utah.edu/sys/pkg/intel/composerxe/bin/compilervars.csh intel64

We recommend that anyone that was receiving the error do one of two things: either get the newest chpc.tcshrc or chpc.bashrc from the chpc website (preferred method) or at least changes the line in their existing .tcshrc or .bashrc that sets up the environment for the intel compilers to the line above.

To update you .tcshrc or .bashrc to the latest (shown for .tcshrc):
1. mv .tcshrc .tcshrc-save (just in case back up)
2. wget -O ~/.tcshrc http://www.chpc.utah.edu/docs/manuals/getting_started/code/chpc.tcshrc
3. comment/uncomment any of the package specific lines for your needs


CHPC Major Downtime: Tuesday June 19th, 2012 beginning at 7:00 AM - EXTENDED due to File Server hardware failure

Posted: June 5, 2012

Event date: June 19, 2012

Duration: From 7 a.m. June 19th until 8:30 p.m. June 20th

Systems Affected/Downtime Timelines:

During this downtime, maintenance will be performed in the datacenters, requiring many systems to be down most of the day. Tentative timeline:

  • HPC Clusters: 7:00 a.m. 6/19 - 8:30 p.m. 7/20
  • CHPC File Services: 8:00 a.m. 6/19 - 7:00 p.m. 7:20
  • Intermittent network outages: 8:00 - 9:00 a.m.

Instructions to User:

Intermittent outages of the CHPC supported networks until about 9:00 a.m.

All of the Desktops mounting the CHPCFS file systems will be affected until approximately 10:00 a.m. Those with Windows and Mac desktops should be able to function, but may not have access to the CHPCFS file systems.

A hardware failure on the CHPCFS server caused an extended downtime on one tray. The full list of affected file systems are: ASTRO_Data1
ASTRO_Data2
ASTRO_HOME
CHEM_Molinero
CHEM_Molinero_Data1
CHEM_Truong_Data1
CHPC_Admin
CHPC_Facelli
CHPC_GuestTransfer
CHPC_INSCC
CHPC_UUFS
CHPC_Vis
CHPC_Web
GEO_SmithRB_Data1
GEO_Thorne_Data1
GEO_Thorne_Data2
MET_ZPu_Data

CHPC recommends that you reboot your desktops if you run into issues after the downtime.

All HPC Clusters were down until the tray was restored and were booted by about 8:30 p.m. 6/20.


Proposals for Allocations Due June 6th, 2012

Posted: May 30, 2012

Proposals and allocation requests for computer time on the updraft and ember clusters are due by June 6th, 2012. We must have this information if you wish to be considered for an allocation of time for the Summer 2012 calendar quarter and/or subsequent three quarters. If you already have an award for Summer 2012, you do not need to re-apply unless you wish to request a different amount from what you were awarded.

Information on the allocation process and relevant forms are available: HPC Allocation Policy

Please note the following:

  • You may request computer time for up to four quarters.
  • Summer Quarter (Jul-Sept) allocations go into effect on July 1, 2012.
  • Only faculty members can request additional computer time for themselves and those working with them. Please consolidate all projects onto one proposal to be listed under the requesting faculty member.
If you have questions, please send email to: issues@chpc.utah.edu

New IDL version installed

Posted: May 25, 2012

We have installed IDL version 8.2 on the Linux machines. It is accessible by sourcing /uufs/uufs/chpc.utah.edu/sys/pkg/idl/8.2/etc/idl.csh or /chpc.utah.edu/sys/pkg/idl/8.2/etc/idl.sh. We encourage IDL users to try this version and let us know if there is any problem. If everything works out right, we will make it a standard version in a month or so.


CHPC Presentation: Using Gaussian09 and Gaussview

Posted: April 9, 2012

Duration: Thursday, April, 12th, 2012, 1:00 - 2:000pm

This presentation by Anita Orendt, held in the INSCC Auditorium (Room 110), will focus on the use of Gaussian09 and Gaussview on the CHPC clusters. Batch scripts and input file formats will be discussed. Parallel scaling and timings with the different scratch options will also be presented, along with a discussion of scratch needs of Gaussian09. Finally several demonstrations on the use of GaussView to build molecules, input structures, set up input files and to analyze output files will be presented.


Software updates on CHPC Linux systems

Posted: April 4, 2012

We have performed the following program updates on Linux systems including the clusters: - updated Matlab to version R2012a - updated PGI compilers to version 12.3 - updated Intel compilers to version 12.1.3 For the Intel compilers, in order not to remove the previous version, we had to change the location. To use the latest version, please, edit your .tcshrc or .bashrc and replace the old Intel sourcing line with source /uufs/chpc.utah.edu/sys/pkg/intel/std/bin/compilervars.csh intel64 or source /uufs/chpc.utah.edu/sys/pkg/intel/std/bin/compilervars.sh intel64 If you experience any problems, let us know at issues at chpc.utah.edu.


Network outage in Komas data center - access to HPC clusters affected from 10:30 to 11:30 a.m. on 3/30/2012

Posted: March 30, 2012

The switch in the CHPC Komas data center started having issues at about 10:30 a.m. (3/30/12). CHPC staff investigated and a resolution was found. The switch again became stable at 11:30 a.m. This affected access to all HPC clusters.


Update to CHPC Web server for public_html from RedHat5 to RedHat6

Posted: March 26, 2012

Event date: March 30, 2012

This Friday afternoon, March 30th, 2012, we will be updating the web server that serves public_html pages on the CHPC supported home directory file servers. We will be updating the operating system from RedHat5 to RedHat6. This upgrade will happen between 3 and 4 p.m.

If you have php or cgi scripts, it is possible that some things might break so we recommend that you check your web pages after the update.

Please let us know if you have questions or concerns by sending email to issues@chpc.utah.edu.


The /scratch/ibrix file systems are back online

Posted: March 19, 2012

After an extended downtime the /scratch/ibrix (/scratch/ibrix/chpc_gen and /scratch/ibrix/icse) file systems are back online and available for use. We apologize for the extended downtime, but we were able to accomplish the upgrade.

Please let us know of issues or problems by sending email to issues@chpc.utah.edu


CHPC *MAJOR* Downtime scheduled for March 13, 2012 - /scratch/ibrix file systems to be PURGED!! Please plan ahead.

Posted: February 16, 2012

Event date: March 13, 2012

Duration: Varied by service

Systems Affected/Downtime Timelines:

  • Intermittent network outages between 7:45 a.m. and 9 a.m.
  • The HPC clusters will be down most of the day to effect data center maintenance.
  • All /scratch/ibrix filesystems will be purged. Users have been given early warning and ample time to clean their files from /scratch/ibrix/chpc_gen, /scratch/ibrix/icse_cap and /scratch/ibrix/icse_perf. Expect these filesystems to remain down until Friday March 16th.
  • Groups who have been contacted by CHPC about migration of home directories, will not have access to their home directories until the final rsync is completed. We will be in contact with each of these groups with expectaions.

  • Desktops will see intermittent network outages in the early morning which should be completed by 9:00 a.m. at the latest.
  • Virtual Machines supported by CHPC will experience intermittent outages from 7:45 - 9:00 a.m.

Instructions to User:

CHPC desktop support users: There will be an network interruption in the early morning hours, for desktop users mounting any CHPC file servers. We recommend you reboot your desktop if you see any issues after 9 a.m.

Users who will be having their home directories migrated to the new Oquirrh file server, we will be performing the final rsync and you will not have desktop access to your home directory until that is completed. This may take up to six hours for some groups. We will be contacting each group separately with details. Again, once this is completed, we recommend a reboot of your desktop once these are completed.

The HPC clusters will be down for most of day while machine room maintenances is in progress.

Virtual machines will experience intermittent outages from 7:45 - 9:00 a.m..

The IBRIX file server for /scratch/ibrix/chpc_gen, /scratch/ibrix/icse_cap and /scratch/ibrix/icse_perf needs an update which requires a reformat of the disks. All data will be lost from these spaces. Users of the HPC clusters need to move all important data off of the /scratch/ibrix file systems PRIOR to this downtime.

***Once the downtime begins, you will not be able to retrieve data stored in any of the /scratch/ibrix file systems. PLEASE look at your space well ahead of time to make sure there is time to migrate your files elsewhere.***

All /scratch/ibrix file systems will most likely remain down for several days after the rest of the systems come back online. We will be mounting the /scratch/general and /scratch/uintah on ember to give users more options during this extended outage of the ibrix scratch spaces. We will return the /scratch/ibrix systems as soon as possible, and expect them to be back by end of the day Friday, March 16th.

As always, please let us know if you have questions or concerns by sending email to issues@chpc.utah.edu.


Change in CHPC Allocation Form Spring 2012 Calendar Quarter

Posted: February 2, 2012

Beginning with Spring 2012 calendar quarter, CHPC will accept application proposal through our new web application instead of the jira system we have used for the past few years. The process and information required will remain exactly the same as before, but we will direct you to our web application where you will be required to enter your uNID and campus password. Allocation proposals for Spring 2012 are Due Wednesday, March 7th. As usual, we will be sending individual email reminders to those CHPC PIs who have proposals due this round. We will no longer be sending hardcopy letters out as reminders.

The allocation proposals should now be submitted by filling out this web form: www.chpc.utah.edu/apps/profile/allocation_form.php

Quick allocation submissions will also move to the CHPC web application. Remember that quick allocations are a one time thing for folks without a current allocation. The quick allocation proposals should now be submitted by filling out this web form: www.chpc.utah.edu/apps/profile/allocation_quick_form.php

Please let us know how this process goes for you and if you have any problems. Please also let us know if you have suggestions on how to improve this process. Please submit all feedback by sending email to issues@chpc.utah.edu.


CHPC Policy Manual - Updated and Improved

Posted: February 1, 2012

CHPC has recently completed a review and update of our User Policies. We have also moved these documents to our wiki. You may still navigate to them from our web site. We invite all our users to review and send us feedback. The new policies can be found at: https://wiki.chpc.utah.edu/display/policy/CHPC+Policy+Manual.

Please send comments and feedback to issues@chpc.utah.edu.


CHPC downtime over

Posted: January 10, 2012

We have brought the clusters online which concludes our downtime. One thing of note is that we finally changed the std links of MVAPICH2 on Updraft to point to the latest version which has slightly different launch commands. See CHPC's wiki help page on MPI, https://wiki.chpc.utah.edu/display/DOCS/MPI, for details how to run this. Ember and Sanddunearch had this change already. As always, if there's a problem let us know via issues at chpc.utah.edu.


DOWNTIME Update

Posted: January 10, 2012

The networking portion of the downtime has been completed. People with deskstops which mount chpc filesystems should now be able to access their files. Work is proceeding at the Komas Datacenter and another message will be sent out when the clusters are available.


CHPC DOWNTIME - TOMMORROW (Tuesday January 10, 2012) starting at 6AM

Posted: January 9, 2012

This is a reminder that CHPC's quarterly downtime will be tomorrow, Tuesday January 10, 2012, starting at 6AM. This downtime is necessary for the routine maintenance on the cooling system in the Komas Datacenter which requires the clusters (Ember, Updraft, Telluride, Sanddunearch, Apexarch, meteo and atmos nodes) and associated scratch file systems to be powered down. We will not be taking the fileservers down, meaning that once the networking updates are complete, most likely before noon, CHPC home directories will be accessible on desktops which mount this space. Notices will be sent out when the networking portion has been completed and again when the clusters are back on-line and available for user.