Message boards : Number crunching : Two consistent and persistent errors
Author | Message |
---|---|
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
1/26/2015 9:01:31 AM | rosetta@home | Starting task rb_01_23_53131_98688__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_242143_2568_0 1/26/2015 9:43:45 AM | rosetta@home | Aborting task rb_01_23_53132_98689__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_242144_2218_0: exceeded elapsed time limit 122964.12 (500000.00G/4.07G) 1/26/2015 3:04:53 PM | rosetta@home | Task rb_01_23_53132_98689__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_242144_2226_0 exited with zero status but no 'finished' file 1/26/2015 3:04:53 PM | rosetta@home | If this happens repeatedly you may need to reset the project. I see other computers have been completing the work units, and I have reset the project a couple times with no effect, including reinstalling the client. Very nearly every unit I've run on this machine in the past couple of weeks gets one of these errors. Anyone have ideas or insights? |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
I just saw this, I'll see if any of these steps make a difference. http://boincfaq.mundayweb.com/index.php?language=1&view=116 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It appears you have specified (in the Rosetta preferences, configured via the website) a runtime preference of 2 days. At present, there seem to be some issues with tasks running that long. Setting the preference to 1 day will avoid the problem until it is fixed. Rosetta Moderator: Mod.Sense |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
I changed that. The other changes per the link above don't appear to have made any difference. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi Erik. I had a quick look at your erred tasks and they all look to be of the rb__SAVE__ALL__OUT type, I have been aborting those for many months because they where always erring on my rigs as well and no one has fixed the problem, I did report it. And I only run tasks here for 4hrs. |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
The timeout errors have been resolved by adjusting the runtime preference. The "exited with zero status" errors are still occurring. Some of the these do complete, but not many. 1/29/2015 1:26:18 PM | rosetta@home | Computation for task rb_01_23_53132_98689__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_242144_2228_0 finished |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I've removed the 2 day option until we fix this issue on the next application update. Sorry for any inconvenience. Please lower your run time preference if you've set it to 2 days. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
I've removed the 2 day option until we fix this issue on the next application update. Uh, so you are working on new app version? :-) |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
Thanks for responding David. I currently have the the target CPU run time preference set to twelve hours. I haven't received any time-out errors since, but the second error, "exited with zero status but no 'finished' file," is returned for nearly every unit I process. 12-Feb-15 20:37:30 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h002___robetta_IGNORE_THE_REST_09_16_242762_18_0 exited with zero status but no 'finished' file 12-Feb-15 20:37:30 | rosetta@home | If this happens repeatedly you may need to reset the project. 12-Feb-15 20:49:19 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h002___robetta_IGNORE_THE_REST_09_16_242762_18_0 exited with zero status but no 'finished' file 12-Feb-15 20:49:19 | rosetta@home | If this happens repeatedly you may need to reset the project. 12-Feb-15 20:56:31 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h003___robetta_IGNORE_THE_REST_11_13_242763_13_0 exited with zero status but no 'finished' file 12-Feb-15 20:56:31 | rosetta@home | If this happens repeatedly you may need to reset the project. 12-Feb-15 21:01:56 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h003___robetta_IGNORE_THE_REST_11_13_242763_13_0 exited with zero status but no 'finished' file 12-Feb-15 21:01:56 | rosetta@home | If this happens repeatedly you may need to reset the project. 12-Feb-15 21:08:22 | rosetta@home | Task Ross3X3_SAVE_ALL_OUT_t149_009_242754_249_0 exited with zero status but no 'finished' file 12-Feb-15 21:08:22 | rosetta@home | If this happens repeatedly you may need to reset the project. 12-Feb-15 21:08:26 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h003___robetta_IGNORE_THE_REST_11_13_242763_13_0 exited with zero status but no 'finished' file |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
but the second error, "exited with zero status but no 'finished' file," is returned for nearly every unit I process. It is a common error that can affect any BOINC project, not just Rosetta. Unfortunately no specific cause has been identified as yet, so it may take a bit of effort to track down a solution that works for your system. You may want to try some of the solutions suggested at this BOINC FAQ website. I had the same problem until I increased my CPU usage to "Use at most 100.0 percent of CPU time" (to avoid heat problems in the summer I reduced the number of cores BOINC can use instead). As soon as that setting was changed all my exit zero errors disappeared instantly. Hopefully your problem will also be as easy to solve, but there are some alternative suggestions on that site as well. --- Edit: I see from the posts above that you already tried those solutions. Unfortunately if they didn't work then there is not much you can do other than play about with your BOINC and system settings in the hope of stumbling across a solution. |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
The odd thing is, Rosetta is the only project which returns those errors. I currently have the processor time set to the default of 50%. I'll let it run at 100% today and see if that makes a difference. |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
So I checked my log after getting home from work today, and it looks like everything is completing successfully now. The last unit to fail was just before I changed the preferences. So, set the target CPU run time to twelve hours, and the max CPU time usage to 100%. I hope the CPU usage requirement will be fixed soon. I don't want to have to run my box at 100% all day in a desert summer. |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
Is there a way to edit the title that I'm missing? I'd like to add [Fixed] to the title. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
not sure about the title. can anyone point me to workunits set for 48 hours that failed? |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
I'm pretty sure I would have had some a couple months ago, but those have cycled out of my logs by now. If no one has any current ones, I can just set my client to grab 48 hour sets. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
can anyone point me to workunits set for 48 hours that failed? Can you use old WU numbers? Or have the details you seek already been purged on those? Rosetta Moderator: Mod.Sense |
Erik Send message Joined: 25 Jun 09 Posts: 11 Credit: 2,904,454 RAC: 0 |
I don't know if this will be helpful, but yesterday when I rebooted my computer to install updates, several of the Rosetta units in process at the time failed, even though I shut down BOINC gracefully. The tasks were all from either the SAVE_ALL_OUT or IGNORE_THE_REST group. Here's a couple examples: 24-Feb-15 22:12:24 | rosetta@home | Task TL_test_2008_0165_0994_0960_2059_00350256_0157_0891_0009_0875_0001_fold_SAVE_ALL_OUT_244879_1833_0 exited with zero status but no 'finished' file 24-Feb-15 22:12:24 | rosetta@home | Task rb_02_23_53371_99352_ab_stage0_h004___robetta_IGNORE_THE_REST_07_15_244897_85_0 exited with zero status but no 'finished' file The next to finish was: 25-Feb-15 02:17:35 | rosetta@home | Computation for task TL_test_1478_0993_0262_0916_2046_0140_0741_0187_0164_0011_0153_0001_fold_SAVE_ALL_OUT_244870_3991_0 finished |
Message boards :
Number crunching :
Two consistent and persistent errors
©2024 University of Washington
https://www.bakerlab.org