2004 CHPC News Announcements

Delicatearch and Tunnelarch re-booted 12/10/04

Posted: December 11, 2004

problems on arches 12/10/04 - rebooted delicatearch and tunnelarch

After our systems group had stablized /scratch/serial, they noticed an issue with one of the main administrative nodes, which was causing serious issues with both Delicatearch and Landscapearch. It was determined that a reboot of those clusters was necessary. All running jobs were lost.

As of about 1:30 am last night (this morning) things seem to be running through the queue again. We have a snapshot of the queue right before the reboot, so if you need/want a priority boost, please let us know. We will be issuing allocation refunds for the jobs which were running at the time of the reboot. We apologize for the inconvenience.


Delicatearch and Tunnelarch re-booted 12/10/04

Posted: December 11, 2004

problems on arches 12/10/04 - rebooted delicatearch and tunnelarch

After our systems group had stablized /scratch/serial, they noticed an issue with one of the main administrative nodes, which was causing serious issues with both Delicatearch and Landscapearch. It was determined that a reboot of those clusters was necessary. All running jobs were lost.

As of about 1:30 am last night (this morning) things seem to be running through the queue again. We have a snapshot of the queue right before the reboot, so if you need/want a priority boost, please let us know. We will be issuing allocation refunds for the jobs which were running at the time of the reboot. We apologize for the inconvenience.


Arches status update

Posted: December 10, 2004

Status update: arches /scratch/serial

We've had the queues on arches suspended from time to time as we have been working on solving the "staleness" problem on /scratch/serial - the good news is that hardware wise, the filesystem has been stable since we replaced it. The better news is that we've found and resolved (we hope) the "staleness" problem. In our efforts to get the filesystem fixed, we updated the system software (kernel) to a newer release, as there were a number of updates to NFS. We have since discovered that the newer kernel was introduced the "staleness" problems, and a few hours ago we moved back to the kernel we were running before. We've had several users try some things and it "appears" that things are healthy now.

There are certain jobs which are still having issues, but so far they appear unrelated to the /scratch/serial issues we've seen, and we are continuing to trouble shoot that.

Please let us know if you are continuing to see any problems.


Arches problem with scratch

Posted: December 10, 2004

Lingering "staleness" of /scratch/serial on Arches

As previously reported, the /scratch/serial filesystem is back up and seemingly stable. We have had several reports of "Stale NFS file handle" problems resulting from the recent problems and we are aware of this issue and looking at how to repair it. Thanks for your patience - we'll let you know as soon as this is resolved.


Recent /scratch/serial problems

Posted: December 10, 2004

Lingering "staleness" of /scratch/serial on Arches

The /scratch/serial filesystem was down several hours last evening, and our systems staff were able to isolate a hardware problem. The CPU has been replaced and things seem to be stable again (knock on wood). If you had failed jobs as a result of the /scratch/serial problems. please send me (julia@chpc.utah.edu) the job numbers and I'll refund your allocations. Also, if you feel more comfortable using the local /tmp filesystem on the nodes, CHPC will (temporarily) try to help you retrieve files for failed jobs for the next few weeks while we insure that the /scratch/serial space is stable.

We are also hoping to have PFVS space up soon which should also help the situation. We apologize for the inconvenience and thank you for your patience.


Arches /scratch/serial down, queues suspended

Posted: December 9, 2004

Arches /scratch/serial down, queues suspended

Arches /scratch/serial down, queues suspended throughout the afternoon. Delicatearch and Landscapearch rebooted (all running jobs on those clusters lost. Queues resumed about 1:30 am on 12/11/04.


arches /scratch/serial down

Posted: December 9, 2004

arches /scratch/serial down

Unstable throughout the day.


Emergency Downtime: ALL CHPC systems - Security Breech (12/12/04)

Posted: December 9, 2004

Emergency Downtime: ALL CHPC systems - Security Breech (12/12/04)

Emergency Downtime: ALL HPC Systems, Friday November 12th, 2004 beginning at noon due to security breakins.

All CHPC NIS passwords reset 11/16/2004

Arches: Available Wednesday 11/17/2004 approx 1:00 pm.

Sierra: Available Wednesday 11/17/2004 approx 5:00 pm.

Icebox: Will begin rebuild over next several weeks.


Network Downtime Results

Posted: December 6, 2004

Network Outage Results, December 9th, 2004

Networking staff was able to successfully swap out two faulty closet switches this evening. The outage lasted only a few minutes and affected just the few ports on the first and second floors that were connected to the respective switches. Networking staff has been monitoring the new switches for over 30 minutes and they appear to be behaving properly.

If there are any problems please contact the networking staff.


Network Outage Thursday, December 9th, 2004

Posted: December 6, 2004

Network Outage Thursday, December 9th, 2004

The Inscc networking staff would like to schedule a brief outage for this Thursday from 6:00 to 6:30 PM. The outage will affect the network ports 1025A through 1048B on the first floor, ports 2121A through 2122B on the second floor, as well as the #2 port on any splitters for the first and second floors. The outage should only affect the hosts directly connected to the specified ports. The downtime will address two closet switches which due to hardware defects need to be replaced.

If there are any questions or concerns please contact networking staff.


Network Outage Thursday, December 9th, 2004

Posted: December 6, 2004

Network Outage Thursday, December 9th, 2004

The Inscc networking staff would like to schedule a brief outage for this Thursday from 6:00 to 6:30 PM. The outage will affect the network ports 1025A through 1048B on the first floor, ports 2121A through 2122B on the second floor, as well as the #2 port on any splitters for the first and second floors. The outage should only affect the hosts directly connected to the specified ports. The downtime will address two closet switches which due to hardware defects need to be replaced.

If there are any questions or concerns please contact networking staff.


Sierra Cluster Rebooted 12/3/04

Posted: December 3, 2004

Sierra Cluster Rebooted 12/3/04

The Sierra Cluster was Rebooted about 5pm on 12/3/04. Problems persisted. Re-booted again and functional 12/7/04.


CHPC Presentations

Posted: December 1, 2004

Hybrid MPI-OpenMP Programming
Thursday, December 2nd, 2004 at 1:30 p.m. in the INSCC Auditorium
(re-scheduled due to technical difficulties)

In this talk we will introduce hybrid MPI-OpenMP programming model designed for distributed shared memory parallel (DSMP) computers. The new Arches metacluster is a representative of this family having two shared memory processors per node. OpenMP generally provides better performing alternative for parallelization inside a node and MPI is used for communication between the distributed processors. We will discuss cases when hybrid programming model is beneficial and provide examples of simple MPI-OpenMP codes on the dual processor nodes of the Icebox cluster.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


/scratch/serial available

Posted: November 23, 2004

rebuilt /scratch/serial available

The new /scratch/serial space is available and the queues are open. The data from the old /scratch/serial will be available (on the arches interactive nodes only) at /scratch/serial.old for a limited time.The reliability of the the /scratch/serial.old is questionable. Please move any data off there asap. We apologize for the inconvenience.


Arches /scratch/serial to be rebuilt

Posted: November 23, 2004

Arches /scratch/serial to be rebuilt - all data will be lost

We have found and confirmed a bug in the system which is causing the /scratch/serial space on the arches cluster to fail. We are currently reformatting /scratch/serial with a new filesystem which should be available later today (11/23/04). Any jobs currently running using /scratch/serial should be killed as they will need to be re-submitted anyway. We will put the queues on hold until the new filesystem is ready. All data on /scratch/serial will be lost. We apologize for the inconvenience and expect this action to make /scratch/serial stable. When the new filesystem is ready, we'll send another message to let you know.


Icebox, the IA-32 cluster, rebuilt. Down 11/22/04 thru 02/02/05.

Posted: November 22, 2004


updated: January 28th, 2005
updated: February 2nd, 2005

Icebox, the IA-32 cluster, rebuilt. Down 11/22/04 thru 02/02/05

The IA-32 cluster was rebuilt due to the security breach in November 2004. Icebox was made available February 2, 2005.


Icebox, the IA-32 cluster, rebuilt. Down 11/22/04 thru 02/02/05.

Posted: November 22, 2004


updated: January 28th, 2005
updated: February 2nd, 2005

Icebox, the IA-32 cluster, rebuilt. Down 11/22/04 thru 02/02/05

The IA-32 cluster was rebuilt due to the security breach in November 2004. Icebox was made available February 2, 2005.


Arches cluster available

Posted: November 17, 2004

Arches cluster available

The arches cluster has been available and accepting jobs for the past few hours, but nis password changes were not getting updated until recently. That has now been fixed so you should be able to login. Please continue to let us know as you run into problems. We sincerely apologize for the inconvenience, and again thank you all for your continued patience.


All Passwords reset on CHPC NIS server

Posted: November 17, 2004

Due to recent systems compromise, all passwords were reset on CHPC NIS server

CHPC NIS server administers accounts and passwords for all CHPC systems, every user who uses our systems needs to obtain new password in order to log in.

You can obtain your new password by:

  • calling help desk (801) 971-3442
  • stopping by 405 INSCC (CHPC main office)
  • faxing request to (801) 585-5366

Proper ID is required (e.g. UU ID)

Users can change the password they obtain to anything that makes them easy to remember, but, follow strong password guidelines, such as no use of common words, or letter combinations, try to mix in upper and lower case letters, digits and special characters. Also, please, don't change the password to the one you had before as this was most likely compromised.

We apologize for the inconvenience and appreciate your patience.


Emergency Downtime: All CHPC systems down due to security concerns

Posted: November 12, 2004

Emergency Downtime: All CHPC systems down due to security concerns

Due to security concerns, Icebox was shut down around noon on November 12th. Arches and Sierra were taken down around 4pm the same day. They will remain down until CHPC can ascertain the security status of the cluster. Another known affected systems is utam.geophys.utah.edu website.

It is very possible that other core servers, such as file servers, mail, NIS,... were also compromised. We may shut them down also without advance notice. If you experience any difficulties with any CHPC or INSCC systems, it is likely a result of this break-in.

As more information becomes available, we will inform you, both via email and messages on the CHPC web site. CHPC staff are working with ISO (the University's Instituional Security Office) on this matter.

We apologize for the inconvenience and appreciate your patience.


Arches NFS Scratch (/scratch/serial) is available

Posted: November 12, 2004

Arches NFS Scratch (/scratch/serial) is available

You may begin using /scratch/serial on the Arches cluster again, a replacement server has been brought up serving same space.


Software Upgrade

Posted: November 12, 2004

Totalview upgrade

Totalview debugger was upped version 6.6 on Icebox and Arches. For info on new features, see http://www.etnus.com/TotalView/Latest_Release.html

Let us know if you experience any problems.


Software Upgrade

Posted: November 11, 2004

MPICH-GM upgraded to 1.2.6..13b

MPICH-GM was upped to 1.2.6..13b, since we had a slight problem with F90 modules in the older build. I did not have chance to test it since all Myrinet nodes are busy, but don't expect any problems except possible need to recompile your code.

If there are any problems, please, let me know ASAP.

Martin Cuma
mcuma@chpc.utah.edu


CHPC policy on passwordless SSH

Posted: November 10, 2004

CHPC policy on passwordless SSH

In the recent past, CHPC inadvertantly allowed passwordless ssh due to an oversight in system configuration. Recentelly we have become aware that several HPC centers have experienced significant security incidents through exploitation of vulnerabilities in passwordless ssh. To avoid breakins which could potentially make our system non-operational for days, we have decided to enforce the use of passwords when using ssh. We understand that this may create some dificulties for some users and we will work in providing alternative mechanisms in a few days.


CHPC Presentations

Posted: November 10, 2004

Fast Parallel I/O at CHPC talk: POSTPONED
(scheduled for 11/11/04)

In tomorrows CHPC talk we were to discuss Fast Parallel I/O at the CHPC. Given the fact that it now only works on Icebox and we are going to roll it out on Arches (/scratch/parallel a.k.a. PVFS2) soon, we'll postpone this talk till this Arches file system is functional and include it in the talk.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentations

Posted: October 29, 2004

Usage of Gaussian03 and Gaussview
Thursday, November 4th, 2004 at 1:30 p.m. in the INSCC Auditorium

This presentation will focus on the use of Gaussian03 and Gaussview on the CHPC systems. The discussion will focus on the functionality of Gaussian as well as the format and construction of input files and PBS scripts. Restrictions on memory usage and disk space will be discussed. Timings of several jobs will be presented to demonstrate the parallel scaling that Gaussian achieves on icebox and arches. Demonstrations on the use of GaussView will also be presented.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentations

Posted: October 25, 2004

Mathematical Libraries at CHPC
Thursday, October 28th, 2004 at 1:30 p.m. in the INSCC Auditorium

In this talk we introduce the users to the mathematical libraries that are installed on the CHPC systems, which are designed to ease the programming and speed-up scientific applications. First, we will talk about BLAS, which is a standardized library of Basic Linear Algebra Subroutines, and present few examples. Then we briefly focus on other libraries that are in use, including freeware LAPACK, ScaLAPACK, PETSc and FFTW, and commercial NAG and custom libraries from Compaq.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


Change in Job Time Limits on Arches Clusters: In effect 6:00 pm Sunday October 24th, 2004

Posted: October 19, 2004

In our continuing effort to try to meet the needs of our users, CHPC will be modifying the job (wallclock) limit polices on the arches clusters as follows:

  • delicatearch: jobs will have a limit of 24 hours wallclock time
  • marchingmen: jobs will have a limit of 72 hours wallclock time (no change)
  • tunnelarch: jobs will have a limit of 5 days (120 hour) wallclock time

(Note: the time limits on icebox and sierra are remaining at 72 hours wallclock time)

This will be implemented on our systems about 6:00 pm on the evening of Sunday, October 24th, 2004.

Users are encouraged to delete (qdel) and re-submit (qsub) all their jobs on delicatearch to be within the new limit immediately, or at least well before the Sunday change. Jobs exceeding that limit (even if they are already in the queue now) will defer and not run after the implementation.


Marching Men (interactive nodes only) not available until further notice

Posted: October 18, 2004

We have seen some strange behavior on the marchingmen interactive nodes. Our systems staff have taken them offline to evaluate and correct the problem. You may still query the batch system and submit jobs to the marchingmen compute nodes by using the full path to the binaries:

/uufs/marchingmen.arches/sys/bin/showq
/uufs/marchingmen.arches/sys/bin/qsub
.
.
.

from any of the other arches interactive nodes (tunnelarch, delicatearch).


CHPC Presentations

Posted: October 18, 2004

Chemistry Packages at CHPC
Thursday, October 21st, 2004, 1:30 p.m. INSCC Auditorium
Presenter: Anita Orendt

This talk will focus on the computational chemistry software packages - Gaussian, Amber, NWChem, Molpro, Amica, Babel, GaussView, ECCE - that are available on CHPC computer systems. The talk will be an overview of the packages and their capabilities, and will focus on details of how users can access the installations at CHPC.| This talk is the precursor for a second talk scheduled for next month that will focus on the use of Gaussian 98/03 and GaussView.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentations

Posted: October 8, 2004

Profiling with Vampir/Guideview
Thursday, October 14th, 2004, 1:30 p.m. INSCC Auditorium
Presenter: Martin Cuma

In this talk, we introduce a new profiling package, Vampir/Guideview, capable of profiling serial, parallel OpenMP and MPI applications. We will explain how to set up basic profiling session for serial and parallel codes, and hint on advanced features such as code instrumentation and object-oriented performance analysis. Within the talk, we will provide several hands-on examples of application profiling.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentations

Posted: September 24, 2004

Debugging with Totalview
Thursday, September 30th, 2004, 1:30 p.m. INSCC Auditorium
Presenter: Martin Cuma

This talk introduces Totalview, a debugger that has become a standard in the Unix code development comunity. After short introduction to its major features, we will present three examples, serial, parallel OpenMP and parallel MPI codes. Using these examples, we will show common and specific features for debugging these codes, as well as point out differences in using Totalview on different CHPC platforms.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentations

Posted: September 17, 2004

Introduction to Programming with OpenMP
Thursday, September 23rd, 2004, 1:30 p.m. INSCC Auditorium
Presenter: Martin Cuma

This talk introduced OpenMP, an increasingly popular and relatively simple shared memory parallel programming model. Two parallelizing schemes, parallel do loops and parallel sections, were detailed using examples. Various clauses that allow user to modify the parallel execution were also presented, including sharing and privatizing of the variables, scheduling, synchronization and mutual exclusion of the parallel tasks. Finally, few hints were given on removing loop dependencies in order to obtain effective parallelization.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentations

Posted: September 10, 2004

Introduction to Programming with MPI
Thursday, September 16th, 2004, 1:30 p.m. INSCC Auditorium
Presenter: Martin Cuma

This course discusses introductory and selected intermediate topics in MPI programming. We base this presentation on two simple examples and explain the MPI parallel development of them. The first example encompasses MPI initialization and simple point to point communication (which takes place between two processes). The second example includes introduction to collective communication calls (where all active processes are involved) and options for effective data communication strategies, such as derived data types and packing the data. Some ideas on more advanced MPI programming options are discussed in the end of the talk.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentations

Posted: September 3, 2004

Introduction to Parallel Computing
Thursday, September 9th, 2004, 1:30 p.m. INSCC Auditorium
Presenter: Martin Cuma

In this talk, we first discuss various parallel architectures and note which ones are represented at the CHPC, in particular, shared and distributed memory parallel computers. A very short introduction into two programming solutions for these machines, MPI and OpenMP, will then be given followed by instructions on how to compile, run, debug and profile parallel applications on the CHPC parallel computers. Although this talk is more directed towards those starting to explore parallel programming, more experienced users can gain from the second half of the talk, that will provide details on software development tools available at the CHPC.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentations

Posted: August 27, 2004

Overview of CHPC
Friday, September 3rd, 2004, 1:30 p.m. INSCC Auditorium
Presenter: Julia Harrison

This presentation gives users new to CHPC, or interested in High Performance Computing an overview of the resources available at CHPC, and the policies and procedures to access these resources. Topic covered will include:

  • The platforms available
  • Filesystems
  • Access
  • An overview of the batch system and policies
  • Service Unit Allocations

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series