Message boards : Number crunching : Mini Rosetta Version 3.41.
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Another two, looks like the same error, you've got a bad batch. rb_10_19_34252_64405__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_62305_29_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490082071 rb_10_19_34252_64405__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_62305_28_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490082070 Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05 ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Got a bucket full of the errors, same as others below. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081939 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081940 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081942 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081943 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081944 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Looks like they all have that same prefix on the WU name and the wingmen are failing as well: rb_10_19_34252_64405 I sent an EMail to DK let him know there seems to be a bad path name in those. Rosetta Moderator: Mod.Sense |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one failed quickly, 11sec. rb_11_02_34573_64567__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_62635_81_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=492457626 Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_rb_11_02_34573_64567__t000__0_C2_robetta.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05 ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Another quickie 14sec. rb_11_06_34628_64941__t000__0_D2_SAVE_ALL_OUT_IGNORE_THE_REST_63295_515_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493214346 BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05 ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
All of the jobs I'm getting with zdock in the task name are failing quickly after start for me and wingmen as well. Looking at sterr_out There seems to be a few different failure modes, one gives an incorrect function 0x1 error, another says maximum disk usage exceeded (I have 5GB set, BOINC/Rosetta usually uses ~1). Yet another has an unhandled exception. Examples: https://boinc.bakerlab.org/workunit.php?wuid=493514155 https://boinc.bakerlab.org/workunit.php?wuid=493512417 https://boinc.bakerlab.org/workunit.php?wuid=493474527 The one that did finish, finished far too quickly and didn't validate, heh: https://boinc.bakerlab.org/workunit.php?wuid=493461620 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I've had these so far, i'm sure there will be more. I have the same amount of space ( Use at most: 4 GB disk space ) on both rig never had a problem with them before. Is that disc use limit set in the input files, why didn't they see this on Ralph? ============================ https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493404478 2HQS_zdock_2HQS_cluster_selectcst_c.3.23_SAVE_ALL_OUT_63681_1_0 Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/2HQS_allinput2.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: Cannot open PDB file "2HQS_ppk_b_start.pdb" ERROR:: Exit from: src/core/import_pose/import_pose.cc line: 198 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> ========================== https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493431352 1T6B_zdock_1T6B_cluster_selectcst_c.13.0_SAVE_ALL_OUT_63610_1_0 Exit status -177 (0xffffffffffffff4f) <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> =================================== https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493429697 2OUL_zdock_2OUL_cluster_selectcst_c.0.41_SAVE_ALL_OUT_63657_1_0 Exit status -177 (0xffffffffffffff4f) <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> =============================== https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493366529 1XU1_zdock_1XU1_cluster_selectcst_c.0.50_SAVE_ALL_OUT_63619_1_0 # cpu_run_time_pref: 14400 ====================================================== DONE :: 20 starting structures 1201 cpu seconds This process generated 20 decoys from 20 attempts ====================================================== BOINC :: WS_max 8.81443e-280 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> =========================== |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
And another, failed! https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493428654 2O8V_zdock_2O8V_cluster_selectcst_c.3.2_SAVE_ALL_OUT_63653_1_0 Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 ====================================================== DONE :: 20 starting structures 1201 cpu seconds This process generated 20 decoys from 20 attempts ====================================================== BOINC :: WS_max 0 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I've increased Boinc disc limit to 10GB on both my rigs to see if that helps, as i've got a few of the zdock tasks in line to run on both rigs. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Well that didn't work, more failed tasks. These 2 got Validate errors. 2B4J_zdock_2B4J_cluster_selectcst_c.1.5_SAVE_ALL_OUT_63634_2_1 1SYX_zdock_1SYX_cluster_selectcst_c.17.1_SAVE_ALL_OUT_63609_1_1 ========================================= This one same disc useage problems, after my changes. 1R0R_zdock_1R0R_cluster_selectcst_c.7.22_SAVE_ALL_OUT_63602_2_0 Fri 09 Nov 2012 13:38:16 EST rosetta@home Aborting task 1R0R_zdock_1R0R_cluster_selectcst_c.7.22_SAVE_ALL_OUT_63602_2_0: exceeded disk limit: 323.93MB > 286.10MB ps/ If your going to put these tasks out in the wild, make sure they run! |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
Hi. I think Polian is right and this is a problem for the project to solve. According to the BOINC FAQ Service it happens when "the amount of disk space that the task uses exceeds the amount of space specified in the <rsc_disk_bound>n</rsc_disk_bound> amount given to the task." Nothing has been run on ralph in a few weeks but if this is a simple typing error then it could have been caught by running a handful on an in-house computer before adding them to the rosetta queue. Perhaps this type of error doesn't happen frequently enough to warrant adding that step to the existing protocols. The tasks appear to error out almost immediately so they don't waste much of our time and the bulk of them have probably already made their way through the system. There will be a few stragglers showing up over the next couple of weeks (dependent on users' settings) but not enough to justify trying to preemptively delete the bad workunits. Best, Snags |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
I think Polian is right and this is a problem for the project to solve. According to the BOINC FAQ Service it happens when "the amount of disk space that the task uses exceeds the amount of space specified in the <rsc_disk_bound>n</rsc_disk_bound> amount given to the task." I think is time for them to increase that limit, the rosetta db takes more than half of it when extracted, the input file for those WUs needs ~180MB. Sure it can't work, don't need big tests for to figure it out, just a pocket calculator. But they probably just didn't think about such a simple thing. Can happen. Milkyway WUs for example have 15MB limit while the WUs need less than 10KB. So make 3GB out of it and it should be enough for a while. . |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Looks like the `zdock` work units have got a problem, I am geting Compute errors on them. 2HQS_zdock_2HQS_cluster_selectcst_c.16.12_SAVE_ALL_OUT_63682_4_0 1EWY_zdock_1EWY_cluster_selectcst_c.32.6_SAVE_ALL_OUT_63531_2_0 1GPW_zdock_1GPW_cluster_selectcst_c.5.6_SAVE_ALL_OUT_63548_2_0 |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
My first zdock crashed apparently before it reached the allowed maximum disc space: 2SIC_zdock_2SIC_cluster_selectcst_c.15.5_SAVE_ALL_OUT_63690_1 . |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
More zdock proplems: Several ended quickly with client error/compute error 2PCC_zdock_2PCC_cluster_selectcst_c.1.53_SAVE_ALL_OUT_63659_5 1YVB_zdock_1YVB_cluster_selectcst_c.0.77_SAVE_ALL_OUT_63621_6 1FLE_zdock_1FLE_cluster_selectcst_c.5.12_SAVE_ALL_OUT_63540_7 Ended with exit status -177, maximum disk usage exceeded, a long stderr out and "SIGPIPE: write on a pipe with no reader". My wingman on the second task received exit status 196 on a Windows machine. 1WEJ_zdock_1WEJ_cluster_selectcst_c.7.6_SAVE_ALL_OUT_63679_7 both copies "process exited with code 1" and ERROR: Cannot open PDB file "1WEJ_ppk_b_start.pdb" ERROR:: Exit from: src/core/import_pose/import_pose.cc line: 198 BOINC:: Error reading and gzipping output datafile: default.out Two more ended with validate errors and the odd, presumably tell-tale, 1201 cpu seconds 2ABZ_zdock_2ABZ_cluster_selectcst_c.4.7_SAVE_ALL_OUT_63630_6 2H7V_zdock_2H7V_cluster_selectcst_c.16.0_SAVE_ALL_OUT_63641_5 Best, Snags |
shilei Volunteer moderator Project developer Project scientist Send message Joined: 25 Aug 11 Posts: 5 Credit: 1,014,314 RAC: 0 |
Sincere apology for all the Zdock errors. The jobs failed due to one missing file in some of the zip files. I have downsized/withdrawn all the WUs. Sorry for the irresponsible submissions. Thanks for your contribution of WUs. It won't happen again. |
FredJVerster Send message Joined: 25 Nov 11 Posts: 4 Credit: 132,655 RAC: 0 |
Sincere apology for all the Zdock errors. The jobs failed due to one missing file in some of the zip files. I have downsized/withdrawn all the WUs. Sorry for the irresponsible submissions. Thanks for your contribution of WUs. It won't happen again. I have MiniRosettas 3.43 that show no progress after 4 hours??? Aborted already 1, no change should I Abort all of them? They kind of fail with no CPU-use? Knights Who Say Ni N! |
Umfriend Send message Joined: 22 Jun 11 Posts: 3 Credit: 12,052,815 RAC: 0 |
I have MiniRosettas 3.43 that show no progress after 4 hours???Same here. The 3.41s are running happily but the 3.43 are not. The graphics window pops up at the start of a WU. No graphics are shown. On one line it says: Stage: unknown [TABs TO RIGHT]No shared mem I'll reset my project once the 3.41s are done and report Umf. Edit: Also, no CPU usage and just 19Mb mem footprint. It's not doing anything. |
JoeyJoJo Send message Joined: 20 Jan 11 Posts: 2 Credit: 823,144 RAC: 0 |
Same issue in the Q&A thread https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6124 |
Message boards :
Number crunching :
Mini Rosetta Version 3.41.
©2024 University of Washington
https://www.bakerlab.org