Hello,
I am using code_aster docker image from simvia docker page(17.4.0)
I launch the run normally with mpi_nbcpu = 4 To cleanly terminate job, I used this command:
pids=$(pgrep -fl python3 | tail -n 4 | awk '{print $1}') && kill -SIGUSR1 $pids
I get the following error:
[1,0]<stdout>: ╔════════════════════════════════════════════════════════════════════════════════════════════════╗
[1,0]<stdout>: ║ <I> <SUPERVIS_96> ║
[1,0]<stdout>: ║ ║
[1,0]<stdout>: ║ Réception du signal USR1. Interruption du calcul demandée... ║
[1,0]<stdout>: ║ ║
[1,0]<stdout>: ╚════════════════════════════════════════════════════════════════════════════════════════════════╝
[1,0]<stdout>:
[1,2]<stdout>:
[1,2]<stdout>: ╔════════════════════════════════════════════════════════════════════════════════════════════════╗
[1,2]<stdout>: ║ <I> <SUPERVIS_96> ║
[1,2]<stdout>: ║ ║
[1,2]<stdout>: ║ Réception du signal USR1. Interruption du calcul demandée... ║
[1,2]<stdout>: ║ ║
[1,2]<stdout>: ╚════════════════════════════════════════════════════════════════════════════════════════════════╝
[1,2]<stdout>:
[1,2]<stdout>:
[1,2]<stdout>: EXECUTION_CODE_ASTER_EXIT_124=-10
[1,2]<stdout>:
[1,2]<stdout>:
[1,2]<stdout>: restoring result databases from 'BASE_PREC'...
[1,2]<stdout>: WARNING: execution failed (command file #1): <F>_ABNORMAL_ABORT
[1,2]<stdout>:
[1,2]<stdout>: # ------------------------------------------------------------------------------
[1,2]<stdout>: Content of /tmp/run_aster_dz75b5w8/proc.2 after execution:
[1,2]<stdout>: .:
[1,2]<stdout>: total 270780
[1,2]<stdout>: -rw-r--r-- 1 user user 407 Apr 14 01:16 124.export
[1,2]<stdout>: drwxr-xr-x 2 user user 4096 Apr 14 01:16 REPE_IN
[1,2]<stdout>: drwxr-xr-x 2 user user 4096 Apr 14 01:16 REPE_OUT
[1,2]<stdout>: -rwxrwxrwx 1 user user 4152220 Apr 14 00:59 fort.20
[1,2]<stdout>: -rw-r--r-- 1 user user 269220 Apr 14 04:47 fort.6
[1,2]<stdout>: -rw-r--r-- 1 user user 0 Apr 14 01:16 fort.8
[1,2]<stdout>: -rw-r--r-- 1 user user 0 Apr 14 01:16 fort.9
[1,2]<stdout>: -rw-r--r-- 1 user user 94208008 Apr 14 04:45 glob.1
[1,2]<stdout>: -rw-r--r-- 1 user user 28585 Apr 14 01:16 stage1_extruded.comm.changed.py
[1,2]<stdout>: -rw-r--r-- 1 user user 180224008 Apr 14 04:47 vola.1
[1,2]<stdout>:
[1,2]<stdout>: REPE_OUT:
[1,2]<stdout>: total 0
[1,2]<stdout>:
[1,2]<stdout>:
[1,2]<stdout>: # ------------------------------------------------------------------------------
[1,2]<stdout>: Execution summary
[1,2]<stdout>: cpu system cpu+sys elapsed
[1,2]<stdout>: --------------------------------------------------------------------------------
[1,2]<stdout>: Preparation of environment 0.00 0.00 0.00 0.00
[1,2]<stdout>: Execution of code_aster 0.01 0.01 0.02 12638.58
[1,2]<stdout>: Copying results 0.00 0.00 0.00 0.01
[1,2]<stdout>: --------------------------------------------------------------------------------
[1,2]<stdout>: Total 0.01 0.01 0.02 12638.59
[1,2]<stdout>: --------------------------------------------------------------------------------
prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:
Process name: [prterun-1c6f5133f2d2-103@1,2]
Exit code: 255
What could be the reason for abnormal abort?
Thanks
Anirudh