This downtime is needed to put a fix in place for issues that have been observed since the September 28 downtime when we upgraded the OS of the clusters.
Since the last downtime we have had a series of issues with certain jobs hanging when starting if these jobs are using one of the lustre file systems. This has been traced to a combination of the version of the OS and the version of the lustre client. We have tested a fix for this issue on a small test cluster and need this downtime to apply the changes on the ember nodes so that additional testing can be completed.
As this is a minor version upgrade to the OS we do not anticipate that it will result in any need to recompile any applications.
Provided that this fix resolves the observed issues, we will then apply the same changes to all remaining clusters on December 21, again starting at 8am.
Reservations are in place to drain the ember batch queue of running jobs by 8am on December 14 and to drain the batch queues of all running jobs on the remaining clusters by 8am on December 21. Any jobs submitted to the batch queue with a wall time that will not allow for completion before these times will not start.
If you have questions or concerns regarding this downtime, please email email@example.com.