Mini Rosetta Version 3.41.

Message boards : Number crunching : Mini Rosetta Version 3.41.

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,845,434
RAC: 12,208
Message 73779 - Posted: 5 Sep 2012, 9:57:26 UTC - in response to Message 73775.  


If there is currently checkpointing only between decoys, then any very long decoy would have to run with no checkpointing inside.

No, minirosetta app has checkpointing feature inside one decoy, so checkpoints should work(and usually works) with long models too.
But in some cases (including the examples I listed above) it does not work.

I think the calculation of these jobs just "hangs" at an early stage(before the first checkpoint). And just waste CPU resources until the watchdog will not stop the work.
ID: 73779 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 73780 - Posted: 5 Sep 2012, 11:21:40 UTC - in response to Message 73779.  
Last modified: 5 Sep 2012, 11:31:07 UTC

No, minirosetta app has checkpointing feature inside one decoy, so checkpoints should work(and usually works) with long models too.
But in some cases (including the examples I listed above) it does not work.

From my observations: some tasks have "stages", those checkpoint inside one decoy, you can see that in the slot directory of such tasks (files with "stage" in the file name and "Starting work on structure: _00001" entries in std_err). Like this one. All other checkpoint at the end of each decoy, which might take a few hours to compute.



I think the calculation of these jobs just "hangs" at an early stage(before the first checkpoint). And just waste CPU resources until the watchdog will not stop the work.

As I have pointed out above, in some cases 7 hours might be not enough for 1 decoy. But such jobs should be only send out to hosts, which allow higher runtimes and not those at default settings.
.
ID: 73780 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 73781 - Posted: 5 Sep 2012, 16:19:27 UTC

ID: 73781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 73787 - Posted: 6 Sep 2012, 13:50:36 UTC - in response to Message 73778.  

and yet more silence from the team...as usual.


Also ralph@home team has comunication problems...



Because ralph is run by the same group as rosetta and ralph is a testing platform so whatever goes wrong there goes wrong and they will see it in the results files. I gave up on ralph long ago due to to many errors and no communication.
The only Baker labs stuff I am signed on for is Rosie.
ID: 73787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ArcSedna

Send message
Joined: 23 Oct 11
Posts: 14
Credit: 69,190,403
RAC: 19,011
Message 73833 - Posted: 14 Sep 2012, 14:23:34 UTC - in response to Message 73708.  

Was there not a bug some time ago that was making work fail at 1201 seconds
Is this it back again ?


It might be. That was happening on MiniRosetta version 3.22.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=5909&nowrap=true#72471

And I'm still having these validate errors...

https://boinc.bakerlab.org/rosetta/result.php?resultid=531638117

Task ID 531638117
Name lr_ab_bench_3solA_SAVE_ALL_OUT_IGNORE_THE_REST_58333_1091_2
Workunit 482120816
Created 13 Sep 2012 21:39:39 UTC
Sent 13 Sep 2012 21:40:02 UTC
Received 14 Sep 2012 0:09:47 UTC
Server state Over
Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 1530475
Report deadline 23 Sep 2012 21:40:02 UTC
CPU time 239.3506
stderr out

<core_client_version>6.12.43</core_client_version>
<![CDATA[
<stderr_txt>
[2012- 9-14 8:50:32:] :: BOINC:: Initializing ... ok.
[2012- 9-14 8:50:32:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_lr_ab_bench_3solA_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 1.04346308677662
Granted credit 0
application version 3.41
ID: 73833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73835 - Posted: 15 Sep 2012, 5:35:46 UTC

Hi.

I've had a few tasks now that don't seem to want to stop when they should, my selected runtime is 4hrs this last one went for 7hrs,42mins as you can see. Why didn't it stop at the 4hr mark & do say ~100 models these tasks are balls'n up my d.c.f. on my rigs.

On a side note the credits are not great for the runtime.
==========================================================

Ebolanator3_1LOUA_ProteinInterfaceDesign_2Sep2012_58540_53_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=483693654

Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
======================================================
DONE :: 215 starting structures 27414.5 cpu seconds
This process generated 215 decoys from 215 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
Claimed credit 282.012672541005
Granted credit 116.338517867505
application version 3.41

ID: 73835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,845,434
RAC: 12,208
Message 73838 - Posted: 15 Sep 2012, 12:59:48 UTC - in response to Message 73835.  
Last modified: 15 Sep 2012, 13:03:38 UTC

Hi.
I've had a few tasks now that don't seem to want to stop when they should, my selected runtime is 4hrs this last one went for 7hrs,42mins as you can see. Why didn't it stop at the 4hr mark & do say ~100 models these tasks are balls'n up my d.c.f. on my rigs.
On a side note the credits are not great for the runtime.
.......

It old known issue - this type of tasks (ProteinInterfaceDesign) have 2 types of models actually: 1st simple and fast calculated (preliminary selection) and 2nd detail models with slow calculation if you lucky and calculations found out something promising. So you may see the 2 variants:
1)If you got WU with only "garbage" (1st type models) calculation goes fast, WU produce lot of models(=lot of Credit too, because Cr granted based on how much models in WU) and finish at exactly right(target) time.
2)If you got WU which find in garbage 1 or more interesting models (worth detail calculation) WU can not stop at right moment because 2nd type models take few hours of calculation each, and WU can't stop and report result until it finish model. And model counter is low, so Cr are low too. (if you very "lucky" and hit good model in very beginning and/or few such models in one WU it can be only 5-10 Cr instead of few hundreds).
But in average from large amount of WUs Cr/hour(day) are near same compare to other type of tasks because high Cr WUs compensate low Cr WUs. So Cr is not problem.

For example WUs from the same series of tasks (my comp)
Low Cr WUs(which probably contain "interesting" slow models), runtimes far exceed the target time(=default 3 hours/10800sec):
Ebolanator3_2jhqa_ProteinInterfaceDesign_2Sep2012_58540_36_0
24k sec of CPU time = 113 models, Claimed credit 149, Granted credit 33
Ebolanator3_2r48a_ProteinInterfaceDesign_2Sep2012_58541_42_0
25k sec = 153 models, Claimed credit 157, Granted credit 73

High Cr WUs(only preselection models), runtimes = target time:
Ebolanator3_3mw8a_ProteinInterfaceDesign_2Sep2012_58540_39_0
~11k sec = 272 models, Claimed credit 67, Granted credit 119
Ebolanator3_2d1va_ProteinInterfaceDesign_2Sep2012_58541_45_0
~11k sec = 410 models, Claimed credit 67, Granted credit 101
Ebolanator3_2NRRA_ProteinInterfaceDesign_2Sep2012_58540_50_0
~11k sec = 332 models, Claimed credit 67, Granted credit 108
ID: 73838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 2,588
Message 73841 - Posted: 15 Sep 2012, 22:17:28 UTC

I suspect that users would like it more if the detailed ProteinInterfaceDesign models contributed much more to the number of credits than the preliminary ProteinInterfaceDesign models.
ID: 73841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 73884 - Posted: 24 Sep 2012, 16:46:54 UTC

Task 532135642 (Ebolanator3_1hsla_ProteinInterfaceDesign_2Sep2012_58541_60_1) failed on W7

ERROR: unknown atom_name: ILE CG
ERROR:: Exit from: ......srccorechemicalResidueType.cc line: 1702
called boinc_finish

</stderr_txt>
]
ID: 73884 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73888 - Posted: 26 Sep 2012, 1:49:24 UTC

I got half a dozen of these garbage tasks today all failed with the same problem.

lr_ab_bench


Watchdog active.
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
BOINC :: WS_max 0


Validate error

ID: 73888 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73985 - Posted: 9 Oct 2012, 2:56:57 UTC

Had this one error out not long ago.

2w9pA_newfrag_abinitio_SAVE_ALL_OUT_61300_1309_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=488023761

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001

ERROR: Assertion failure: runtime_assert( ( begin + size - 1 ) <= pose.total_residue() );
ERROR:: Exit from: src/protocols/simple_moves/FragmentMover.cc line: 260
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 73985 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Viking69
Avatar

Send message
Joined: 3 Oct 05
Posts: 20
Credit: 6,780,190
RAC: 2,146
Message 73996 - Posted: 10 Oct 2012, 14:24:30 UTC

Lots of Computing Errors on my WU's too. Whats Up?
Hi all you enthusiastic crunchers.....
ID: 73996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73997 - Posted: 11 Oct 2012, 1:37:15 UTC

Another error.

cyst_d17_0000_abinitio_SAVE_ALL_OUT_61396_213_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=488307382


<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
----

Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...

ERROR: ERROR: FragmentIO: could not open file aa0000109_05.200_v1_3
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 233
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>


ID: 73997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74009 - Posted: 13 Oct 2012, 4:21:07 UTC

Another error.

rb_10_12_34084_64234__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_61699_37_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=488755685


<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

=======

# cpu_run_time_pref: 14400

ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05
ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 74009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bjarke

Send message
Joined: 14 Feb 06
Posts: 5
Credit: 1,634,479
RAC: 0
Message 74010 - Posted: 13 Oct 2012, 13:37:13 UTC

Numerous errors for both of my computers:
Tasks for host 1569084
Tasks for host 1569102

All wu's for the past two days are failing...
ID: 74010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,845,434
RAC: 12,208
Message 74012 - Posted: 13 Oct 2012, 19:26:26 UTC - in response to Message 74010.  

Numerous errors for both of my computers:
All wu's for the past two days are failing...

Congratulations! You are "lucky" and you got a special bug - now all your WU' will be count as errors. This a old known bug, but its cause and how to fix it is still not known. Details in this thread:

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6050

Think of what could have changed over the 2 days? Installing new software (or drivers), change settings(hardware or software), etc.
ID: 74012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bjarke

Send message
Joined: 14 Feb 06
Posts: 5
Credit: 1,634,479
RAC: 0
Message 74021 - Posted: 15 Oct 2012, 9:32:27 UTC - in response to Message 74012.  

Numerous errors for both of my computers:
All wu's for the past two days are failing...

Congratulations! You are "lucky" and you got a special bug - now all your WU' will be count as errors. This a old known bug, but its cause and how to fix it is still not known. Details in this thread:

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6050

Think of what could have changed over the 2 days? Installing new software (or drivers), change settings(hardware or software), etc.


F***ing crap. I will definately switch to another project then.
ID: 74021 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 2,588
Message 74027 - Posted: 16 Oct 2012, 2:53:15 UTC - in response to Message 74021.  

Numerous errors for both of my computers:
All wu's for the past two days are failing...

Congratulations! You are "lucky" and you got a special bug - now all your WU' will be count as errors. This a old known bug, but its cause and how to fix it is still not known. Details in this thread:

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6050

Think of what could have changed over the 2 days? Installing new software (or drivers), change settings(hardware or software), etc.


F***ing crap. I will definately switch to another project then.


The last time I had a similar problem (on a different BOINC project) I already had several projects enabled, so I set the one with the problem to no new tasks, let the computer finish all tasks for that project, then told BOINC Manager to reset that project, then allowed more workunits for that project. The new workunits started working correctly. I have no way to tell if this will work for your problem, though.
ID: 74027 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 74053 - Posted: 19 Oct 2012, 17:22:12 UTC

rb_10_13_34107_64244__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_61777_916_0

This task 537475495 failed on Mac OS X with this error message

# cpu_run_time_pref: 21600
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 nan
THETA3 nan
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 771
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>
ID: 74053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74063 - Posted: 20 Oct 2012, 23:51:39 UTC

An error.

rb_10_19_34246_64381__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_62288_83_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490064035

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05
ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 74063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Mini Rosetta Version 3.41.



©2024 University of Washington
https://www.bakerlab.org