Hello all,
I just downloaded the new container salome_meca-lgpl-2024.1.0-1-20240327.sif. I could not get the previous version to work on my system, somewhere there's a thread about that.
However, I also encountered an error in this version, but I was able to resolve it right away. Therefore I thought it'd be nice to share that right away. Some info about my system:
Ubuntu 24.04LTS,
Nvidia GTX3090 with Driver Version: 535.183.01,
singularity-ce version 4.1.4-noble (not the latest!).
I launched the container the usual way with:
singularity run --app install salome_meca-lgpl-2024.1.0-1-20240327-scibian-11.sif
Then I launch Salome-Meca 2024 with:
./salome_meca-lgpl-2024.1.0-1-20240327-scibian-11
The window pops up, I can do some stuff, change to other modules.... BUT: when I changed to Asterstudy, SM2024 crashes, giving me this error message (rather long, sorry):
Open MPI's OFI driver detected multiple equidistant NICs from the current process,
but had insufficient information to ensure MPI processes fairly pick a NIC for use.
This may negatively impact performance. A more modern PMIx server is necessary to
resolve this issue.
AsterStudy: jeu.-09-18:12:41.638 AsterStudy is activating...
AsterStudy: jeu.-09-18:12:41.638 Creating workspace...
AsterStudy: jeu.-09-18:12:41.661 Refreshing configuration on localhost...
A process has executed an operation involving a call
to the fork() system call to create a child process.
As a result, the libfabric EFA provider is operating in
a condition that could result in memory corruption or
other system errors.
For the libfabric EFA provider to work safely when fork()
is called, you will need to set the following environment
variable:
RDMAV_FORK_SAFE
However, setting this environment variable can result in
signficant performance impact to your application due to
increased cost of memory registration.
You may want to check with your application vendor to see
if an application-level alternative (of not using fork)
exists.
Your job will now abort.
[HP-Z8-G4-Workstation:3765134] *** Process received signal ***
[HP-Z8-G4-Workstation:3765134] Signal: Abandon (6)
[HP-Z8-G4-Workstation:3765134] Signal code: (-6)
[HP-Z8-G4-Workstation:3765134] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x73b3fd2cf140]
[HP-Z8-G4-Workstation:3765134] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x73b3fa6cad51]
[HP-Z8-G4-Workstation:3765134] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x73b3fa6b4537]
[HP-Z8-G4-Workstation:3765134] [ 3] /usr/lib/x86_64-linux-gnu/libfabric.so.1(+0x7c14e)[0x73b3e0a2914e]
[HP-Z8-G4-Workstation:3765134] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80f18)[0x73b3fa712f18]
[HP-Z8-G4-Workstation:3765134] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_fork+0x20)[0x73b3fa759a70]
[HP-Z8-G4-Workstation:3765134] [ 6] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x245697)[0x73b404a35697]
[HP-Z8-G4-Workstation:3765134] [ 7] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x10de74)[0x73b4048fde74]
[HP-Z8-G4-Workstation:3765134] [ 8] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyObject_MakeTpCall+0x90)[0x73b4048b7b00]
[HP-Z8-G4-Workstation:3765134] [ 9] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyEval_EvalFrameDefault+0x8632)[0x73b4048692f2]
[HP-Z8-G4-Workstation:3765134] [10] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x1a5fdc)[0x73b404995fdc]
[HP-Z8-G4-Workstation:3765134] [11] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyFunction_Vectorcall+0x9d)[0x73b4048b2d2d]
[HP-Z8-G4-Workstation:3765134] [12] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyEval_EvalFrameDefault+0x77f7)[0x73b4048684b7]
[HP-Z8-G4-Workstation:3765134] [13] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x1a5fdc)[0x73b404995fdc]
[HP-Z8-G4-Workstation:3765134] [14] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyFunction_Vectorcall+0x9d)[0x73b4048b2d2d]
[HP-Z8-G4-Workstation:3765134] [15] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyObject_FastCallDictTstate+0xce)[0x73b4048ba04e]
[HP-Z8-G4-Workstation:3765134] [16] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyObject_Call_Prepend+0xe0)[0x73b4048ba1f0]
[HP-Z8-G4-Workstation:3765134] [17] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x1308f2)[0x73b4049208f2]
[HP-Z8-G4-Workstation:3765134] [18] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x125b3a)[0x73b404915b3a]
[HP-Z8-G4-Workstation:3765134] [19] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyObject_MakeTpCall+0x90)[0x73b4048b7b00]
[HP-Z8-G4-Workstation:3765134] [20] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyEval_EvalFrameDefault+0x76a1)[0x73b404868361]
[HP-Z8-G4-Workstation:3765134] [21] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x1a5fdc)[0x73b404995fdc]
[HP-Z8-G4-Workstation:3765134] [22] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyFunction_Vectorcall+0x9d)[0x73b4048b2d2d]
[HP-Z8-G4-Workstation:3765134] [23] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyEval_EvalFrameDefault+0x5f8d)[0x73b404866c4d]
[HP-Z8-G4-Workstation:3765134] [24] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x70033)[0x73b404860033]
[HP-Z8-G4-Workstation:3765134] [25] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyEval_EvalFrameDefault+0x77f7)[0x73b4048684b7]
[HP-Z8-G4-Workstation:3765134] [26] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x1a5fdc)[0x73b404995fdc]
[HP-Z8-G4-Workstation:3765134] [27] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyFunction_Vectorcall+0x9d)[0x73b4048b2d2d]
[HP-Z8-G4-Workstation:3765134] [28] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(PyEval_EvalFrameDefault+0x77f7)[0x73b4048684b7]
[HP-Z8-G4-Workstation:3765134] [29] /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0(+0x1a5fdc)[0x73b404995fdc]
[HP-Z8-G4-Workstation:3765134] *** End of error message ***
The message gives the solution away, very smart. The solution is to set the environment variable:
before doing the ./salome_meca-lgpl-2024.1.0-1-20240327-scibian-11.
So, in essence, a workaround is to execute the following commands in succesion:
singularity run --app install salome_meca-lgpl-2024.1.0-1-20240327-scibian-11.sif
export RDMAV_FORK_SAFE=1
./salome_meca-lgpl-2024.1.0-1-20240327-scibian-11
That's it! Hope it helps, BEFORE someone else runs into the same problem,
Mario.