You are here:

Potential troubles running programs after clusters OS update on September 28th

Posted: October 4th, 2017

We have had several reports of codes that previously ran giving wrong results or segmentation faults.

We have tracked this to a bug in an older version of Intel compiler that has shown up with the basic OS library (glibc) update we did during the downtime of September 28th.

This bug only occurs on codes built with Intel compiler 2017.2 and prior, and only on Kingspeak (with the AVX vectorization, e.g. the -axAVX compilation flag, or if compiled on kingspeak).

There are 2 solutions:

  1. Recompile the code with Intel stack 2017.4 which we have installed since end of May and is the default. It may be worth to wait for the announcement of the intel/2018.0 availability which we'll have hopefully installed within a few days, just in case there are some incompatibilities from the previous 2017.4 version.
  2. If recompiling is not practical, or if waiting for the 2018 compiler,set environment variable LD_BIND_NOW=1 - this changes the loading of dynamic library routines.
    Setting this variable in your shell with "setenv LD_BIND_NOW 1" for tcsh or "export LD_BIND_NOW=1" for bash will only work for serial or thread parallel programs.

To run MPI parallel programs, pass this environment variable in the mpirun command, e.g for MPICH variants (MPICH, MVAPICH2, Intel MPI) as mpirun -np $SLURM_NTASKS -genv LD_BIND_NOW 1 ... or for OpenMPI as mpirun -np $SLURM_NTASKS -x LD_BIND_NOW=1 ...<

More details on this issue are here:

https://www.schrodinger.com/kb/230624

and here:

https://sourceware.org/bugzilla/show_bug.cgi?id=21236#c6

If you have any questions, please, let us know: issues@chpc.utah.edu.

Last Updated: 10/4/17