Message boards : Number crunching : exited with zero status but no 'finished' file
Author | Message |
---|---|
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Like others, I'm seeing messages in the event log (Mac OS X 10.6.8/Boinc 7.0.31) reporting this error: exited with zero status but no 'finished' file I happens on a machine that is on 24/7 so I don't think it's a hibernate/sleep issue. Sample output Sat Nov 3 23:17:29 2012 | rosetta@home | Scheduler request completed: got 0 new tasks Sat Nov 3 23:19:04 2012 | rosetta@home | Finished download of input_hyb_al_02_bench_3slkB_yfsong.zip Sat Nov 3 23:19:47 2012 | | Suspending network activity - user request Sun Nov 4 02:53:03 2012 | rosetta@home | Computation for task Ploop4_2_abinitio_design_y465_009_60334_1680_0 finished Sun Nov 4 02:53:20 2012 | rosetta@home | Starting task rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_08_05_62798_11_0 using minirosetta version 341 in slot 1 Sun Nov 4 02:55:19 2012 | rosetta@home | Task hyb_al_08_bench_3slkB_SAVE_ALL_OUT_IGNORE_THE_REST_60945_2133_0 exited with zero status but no 'finished' file Sun Nov 4 02:55:19 2012 | rosetta@home | If this happens repeatedly you may need to reset the project. Sun Nov 4 02:55:19 2012 | rosetta@home | Restarting task hyb_al_08_bench_3slkB_SAVE_ALL_OUT_IGNORE_THE_REST_60945_2133_0 using minirosetta version 341 in slot 0 Sun Nov 4 08:36:36 2012 | rosetta@home | Computation for task rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_08_05_62798_11_0 finished Sun Nov 4 08:36:48 2012 | rosetta@home | Starting task rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_10_03_62798_11_0 using minirosetta version 341 in slot 1 Sun Nov 4 08:40:53 2012 | rosetta@home | Task hyb_al_08_bench_3slkB_SAVE_ALL_OUT_IGNORE_THE_REST_60945_2133_0 exited with zero status but no 'finished' file Sun Nov 4 08:40:53 2012 | rosetta@home | If this happens repeatedly you may need to reset the project. Sun Nov 4 08:40:53 2012 | rosetta@home | Restarting task hyb_al_08_bench_3slkB_SAVE_ALL_OUT_IGNORE_THE_REST_60945_2133_0 using minirosetta version 341 in slot 0 Sun Nov 4 08:42:18 2012 | rosetta@home | work fetch suspended by user Sun Nov 4 08:42:56 2012 | rosetta@home | task hyb_al_08_bench_3slkB_SAVE_ALL_OUT_IGNORE_THE_REST_60945_2133_0 aborted by user Sun Nov 4 08:42:57 2012 | rosetta@home | Starting task rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_08_07_62798_7_0 using minirosetta version 341 in slot 2 Sun Nov 4 08:43:39 2012 | rosetta@home | Computation for task hyb_al_08_bench_3slkB_SAVE_ALL_OUT_IGNORE_THE_REST_60945_2133_0 finished Sun Nov 4 08:44:15 2012 | rosetta@home | Task rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_08_07_62798_7_0 exited with zero status but no 'finished' file Sun Nov 4 08:44:15 2012 | rosetta@home | If this happens repeatedly you may need to reset the project. Sun Nov 4 08:44:15 2012 | rosetta@home | Restarting task rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_08_07_62798_7_0 using minirosetta version 341 in slot 2 Sun Nov 4 08:44:17 2012 | rosetta@home | Task rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_10_03_62798_11_0 exited with zero status but no 'finished' file Sun Nov 4 08:44:17 2012 | rosetta@home | If this happens repeatedly you may need to reset the project. Sun Nov 4 08:44:17 2012 | rosetta@home | Restarting task rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_10_03_62798_11_0 using minirosetta version 341 in slot 1 Sun Nov 4 08:44:20 2012 | | Resuming network activity Sun Nov 4 08:44:20 2012 | rosetta@home | Started upload of Ploop4_2_abinitio_design_y465_009_60334_1680_0_0 Sun Nov 4 08:44:20 2012 | rosetta@home | Started upload of rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_08_05_62798_11_0_0 Sun Nov 4 08:44:25 2012 | rosetta@home | Finished upload of Ploop4_2_abinitio_design_y465_009_60334_1680_0_0 Sun Nov 4 08:44:27 2012 | rosetta@home | Finished upload of rb_11_03_30323_64727_h001__sp1_IGNORE_THE_REST_08_05_62798_11_0_0 Sun Nov 4 08:44:31 2012 | rosetta@home | Sending scheduler request: To report completed tasks. Sun Nov 4 08:44:31 2012 | rosetta@home | Reporting 3 completed tasks Sun Nov 4 08:44:31 2012 | rosetta@home | Not requesting tasks: scheduler RPC backoff Sun Nov 4 08:44:35 2012 | rosetta@home | Scheduler request completed |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,135,082 RAC: 4,703 |
|
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
Most recently discussed here ? I assumed Mod.Sense suggested a new thread because svincent originally posted in the "Current issues with 7+ BOINC client" thread. I made a link to a different, slightly older thread (with the exact same title as this thread, "exited with zero status but no 'finished' file") simply because I didn't have time to summarize it. I will repost the link to the BOINC FAQ Service page which describes this long standing (since BOINC 5+) error message and the possible causes and solutions. If svincent or googloo (from the previous thread) still think it's related to the 7+ BOINC client then it would be most helpful if they post back detailing how they eliminated the other triggers. Best, Snags |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Sorry: missed that thread. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Interesting. As you say, you'd think a machine that's always active wouldn't have such problems. Can you think of any other activity on the machine that might cause all of the tasks to encounter the error at the same time like that? Or is that NOT all of the active tasks? Rosetta Moderator: Mod.Sense |
gazzawazza Send message Joined: 4 May 07 Posts: 28 Credit: 297,648 RAC: 0 |
hi all I've just realised that a task I'd been crunching for a while (I'd got probably 2-3 hours elapsed on it and I think about 30% completed) had just randomly restarted itself. However, I'm not sure which thread to post to on this matter. My platform's vista 32 bit. Here's an extract from my event log from today. I've filtered just the Rosetta entries: 10/11/2012 11:59:15 | | No config file found - using defaults 10/11/2012 11:59:16 | | Starting BOINC client version 7.0.28 for windows_intelx86 10/11/2012 11:59:16 | | log flags: file_xfer, sched_ops, task 10/11/2012 11:59:16 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 10/11/2012 11:59:16 | | Data directory: C:ProgramDataBOINC 10/11/2012 11:59:16 | | Running under account Gary 10/11/2012 11:59:16 | | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [Family 6 Model 15 Stepping 11] 10/11/2012 11:59:16 | | Processor: 4.00 MB cache 10/11/2012 11:59:16 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 nx lm vmx tm2 pbe 10/11/2012 11:59:16 | | OS: Microsoft Windows Vista: Home Premium x86 Edition, Service Pack 2, (06.00.6002.00) 10/11/2012 11:59:16 | | Memory: 3.12 GB physical, 7.70 GB virtual 10/11/2012 11:59:16 | | Disk: 298.09 GB total, 7.31 GB free 10/11/2012 11:59:16 | | Local time is UTC +0 hours 10/11/2012 11:59:16 | | NVIDIA GPU 0: GeForce GTX 570 (driver version 306.97, CUDA version 5.0, compute capability 2.0, 1280MB, 1169MB available, 1405 GFLOPS peak) 10/11/2012 11:59:16 | | OpenCL: NVIDIA GPU 0: GeForce GTX 570 (driver version 306.97, device version OpenCL 1.1 CUDA, 1280MB, 1169MB available) 10/11/2012 11:59:16 | rosetta@home | URL https://boinc.bakerlab.org/rosetta/; Computer ID 1112807; resource share 100 10/11/2012 11:59:16 | | Reading preferences override file 10/11/2012 11:59:16 | | Preferences: 10/11/2012 11:59:16 | | max memory usage when active: 2398.19MB 10/11/2012 11:59:16 | | max memory usage when idle: 2398.19MB 10/11/2012 11:59:16 | | max disk usage: 5.00GB 10/11/2012 11:59:16 | | don't compute while active 10/11/2012 11:59:16 | | don't use GPU while active 10/11/2012 11:59:16 | | suspend work if non-BOINC CPU load exceeds 50 % 10/11/2012 11:59:16 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) 10/11/2012 11:59:16 | | Not using a proxy 10/11/2012 12:00:12 | rosetta@home | Restarting task hyb_ac_bench_3rdeD_20_SAVE_ALL_OUT_IGNORE_THE_REST_54744_117_0 using minirosetta version 341 in slot 2 10/11/2012 12:00:12 | rosetta@home | Sending scheduler request: To fetch work. 10/11/2012 12:00:12 | rosetta@home | Requesting new tasks for NVIDIA 10/11/2012 12:01:15 | rosetta@home | Scheduler request completed: got 0 new tasks 10/11/2012 12:06:49 | | Project communication failed: attempting access to reference site 10/11/2012 12:06:51 | | Internet access OK - project servers may be temporarily down. 10/11/2012 12:06:51 | rosetta@home | Sending scheduler request: To fetch work. 10/11/2012 12:06:51 | rosetta@home | Requesting new tasks for NVIDIA 10/11/2012 12:06:53 | rosetta@home | Scheduler request completed: got 0 new tasks 10/11/2012 12:25:54 | rosetta@home | Sending scheduler request: To fetch work. 10/11/2012 12:25:54 | rosetta@home | Requesting new tasks for NVIDIA 10/11/2012 12:25:56 | rosetta@home | Scheduler request completed: got 0 new tasks 10/11/2012 12:46:51 | rosetta@home | Sending scheduler request: To fetch work. 10/11/2012 12:46:51 | rosetta@home | Requesting new tasks for NVIDIA 10/11/2012 12:46:52 | rosetta@home | Scheduler request completed: got 0 new tasks 10/11/2012 13:56:02 | rosetta@home | Sending scheduler request: To fetch work. 10/11/2012 13:56:02 | rosetta@home | Requesting new tasks for NVIDIA 10/11/2012 13:56:04 | rosetta@home | Scheduler request completed: got 0 new tasks 10/11/2012 15:14:50 | | Project communication failed: attempting access to reference site 10/11/2012 15:14:52 | | Internet access OK - project servers may be temporarily down. 10/11/2012 16:44:41 | rosetta@home | Task hyb_ac_bench_3rdeD_20_SAVE_ALL_OUT_IGNORE_THE_REST_54744_117_0 exited with zero status but no 'finished' file 10/11/2012 16:44:41 | rosetta@home | If this happens repeatedly you may need to reset the project. 10/11/2012 16:44:41 | rosetta@home | Restarting task hyb_ac_bench_3rdeD_20_SAVE_ALL_OUT_IGNORE_THE_REST_54744_117_0 using minirosetta version 341 in slot 2 Please note though that in the unfiltered log, the entry immediately prior to the "... exited with zero status" statement (at 16:44:41) was at 15:40:02 I.e. 55 minutes earlier. So, there's no recorded event to give us a clue as to why this task "exited with zero status but no 'finished' file". I suspect as well, in hindsight, this isn't the first time the task has restarted (see last entry in log), since I'd swear the task has been 'crunched' for some time and I feel, whenever I've glanced at it's level of completion, it's typically been around 30%. Hope that makes sense. The log states that I might need to reset the project, if this keeps happening. Do you think this is appropriate action or would you like me to do any diagnostic work for you? Regards, Gary |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Interesting. As you say, you'd think a machine that's always active wouldn't have such problems. Can you think of any other activity on the machine that might cause all of the tasks to encounter the error at the same time like that? Or is that NOT all of the active tasks? These tasks are run on an oldish Mac Mini that I no longer use for other work. This computer runs 24/7 on R@h (6-hour time preferences) only and is only connected to the Internet once a week. So it's unlikely to be related to other tasks. However, it does seem to be related to this error message: No heartbeat from core client for 30 sec - exiting Here, from the error log, are messages related to a task that successfully completed but restarted 3 times. Fri Nov 9 23:13:14 2012 | rosetta@home | Starting task Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k062_007_62403_1239_1 using minirosetta version 341 in slot 1 Sat Nov 10 00:16:46 2012 | rosetta@home | Task Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k062_007_62403_1239_1 exited with zero status but no 'finished' file Sat Nov 10 00:16:46 2012 | rosetta@home | If this happens repeatedly you may need to reset the project. Sat Nov 10 00:16:46 2012 | rosetta@home | Restarting task Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k062_007_62403_1239_1 using minirosetta version 341 in slot 1 at Nov 10 00:25:07 2012 | rosetta@home | Task Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k062_007_62403_1239_1 exited with zero status but no 'finished' file Sat Nov 10 00:25:07 2012 | rosetta@home | If this happens repeatedly you may need to reset the project. Sat Nov 10 00:25:07 2012 | rosetta@home | Restarting task Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k062_007_62403_1239_1 using minirosetta version 341 in slot 1 Sat Nov 10 00:30:49 2012 | rosetta@home | Task Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k062_007_62403_1239_1 exited with zero status but no 'finished' file Sat Nov 10 00:30:49 2012 | rosetta@home | If this happens repeatedly you may need to reset the project. Sat Nov 10 00:30:49 2012 | rosetta@home | Restarting task Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k062_007_62403_1239_1 using minirosetta version 341 in slot 1 Sat Nov 10 05:28:49 2012 | rosetta@home | Computation for task Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k062_007_62403_1239_1 finished Digging through the Rosetta results, it turned out that this was task 541753482. In the stderr out section, and with times corresponding to the messages as recorded in the error log above, are 3 messages: No heartbeat from core client for 30 sec - exiting In two out of the three cases this message is followed by another error message: FILE_LOCK::unlock(): close failed.: Bad file descriptor Hope this helps. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
This task 544147042 behaved in a similar way to that described in the above post: i.e. continual restarts with the error message "exited with zero status but no 'finished' file" in the Event log coinciding with "No heartbeat from core client for 30 sec - exiting" in the Task details log. A couple of extra observations though: 1) Quitting and restarting Boinc had no effect. 2) Suspending all other tasks did do the trick: the task successfully ran to completion. It's one of those monster hyb_* tasks that required 11 hours to complete a single decoy, and although it returned a status of valid, the error messages at the end of the log do suggest a lingering problem. WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 36398.7 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish </stderr_txt> ]]> Validate state Valid (Mac OS X 10.6.8/Boinc 7.0.31) |
John Wood Send message Joined: 11 Aug 08 Posts: 1 Credit: 3,222,968 RAC: 0 |
I get this "exited with zero status......" all the time 09/12/2012 13:44:39 | rosetta@home | Task 3helix_2_22_abinitio_SAVE_ALL_OUT_67402_2169_0 exited with zero status but no 'finished' file 09/12/2012 13:44:39 | rosetta@home | If this happens repeatedly you may need to reset the project. 09/12/2012 14:17:44 | rosetta@home | Task 3helix_2_22_abinitio_SAVE_ALL_OUT_67402_2169_0 exited with zero status but no 'finished' file 09/12/2012 14:17:44 | rosetta@home | If this happens repeatedly you may need to reset the project. 09/12/2012 14:17:44 | rosetta@home | Restarting task 3helix_2_22_abinitio_SAVE_ALL_OUT_67402_2169_0 using minirosetta version 345 in slot 1 09/12/2012 14:31:11 | rosetta@home | Task 5srsmn_3399m2_abinitio_SAVE_ALL_OUT_66557_2003_0 exited with zero status but no 'finished' file 09/12/2012 14:31:11 | rosetta@home | If this happens repeatedly you may need to reset the project. 09/12/2012 14:31:11 | rosetta@home | Restarting task 5srsmn_3399m2_abinitio_SAVE_ALL_OUT_66557_2003_0 using minirosetta version 345 in slot 0 09/12/2012 14:40:36 | rosetta@home | Task 3helix_2_22_abinitio_SAVE_ALL_OUT_67402_2169_0 exited with zero status but no 'finished' file 09/12/2012 14:40:36 | rosetta@home | If this happens repeatedly you may need to reset the project. 09/12/2012 14:40:36 | rosetta@home | Restarting task 3helix_2_22_abinitio_SAVE_ALL_OUT_67402_2169_0 using minirosetta version 345 in slot 1 09/12/2012 15:16:41 | rosetta@home | Task 3helix_2_22_abinitio_SAVE_ALL_OUT_67402_2169_0 exited with zero status but no 'finished' file 09/12/2012 15:16:41 | rosetta@home | If this happens repeatedly you may need to reset the project. 09/12/2012 15:16:41 | rosetta@home | Restarting task 3helix_2_22_abinitio_SAVE_ALL_OUT_67402_2169_0 using minirosetta version 345 in slot 1 09/12/2012 16:14:42 | rosetta@home | Task 3helix_2_22_abinitio_SAVE_ALL_OUT_67402_2169_0 exited with zero status but no 'finished' file 09/12/2012 16:14:42 | rosetta@home | If this happens repeatedly you may need to reset the project. there seems to be a problem but the results posted back seem OK as I get credit. my machine is not so busy so why the need to be 'restarting' so often? I did previously reset my project several times until I realised it made no difference, just wasted useful work that was mostly completed. Can anyone explain what is happening,please? |
Message boards :
Number crunching :
exited with zero status but no 'finished' file
©2024 University of Washington
https://www.bakerlab.org