Skip to content

Tangent Unscheduled Downtime - Hardware Failure

Date Posted: July 11th, 2016

Tangent was restored to services July 15th. Jobs that were idle in the batch queue before the hardware issue are now running and users can now submit new jobs.


Due to a hardware failure on the tangent gear, we have  turned off the resource manager  - therefore any slurm scheduler command will time out and give a “Unable to contact slurm controller (connect failure)” response.  Currently running jobs are OK, and will finish unless the nodes have to be rebooted to fix the problem. 

Once the hardware issue has been resolved we will restart the resource manager and restart slurm. 

 
Last Updated: 12/17/24