Mini Rosetta Version 3.41.

Message boards : Number crunching : Mini Rosetta Version 3.41.

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73700 - Posted: 27 Aug 2012, 2:01:22 UTC

Hi.

I received this error with the new app, task finished really fast for some reason.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=480803868

lr_aa_bench_2lg5A_SAVE_ALL_OUT_IGNORE_THE_REST_53023_393_0


CPU time 439.1
stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2012- 8-27 11:29:54:] :: BOINC:: Initializing ... ok.
[2012- 8-27 11:29:54:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_lr_aa_bench_2lg5A_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
======================================================
DONE :: 13 starting structures 1201 cpu seconds
This process generated 13 decoys from 13 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 3.41337554548822
Granted credit 0
application version 3.41

ID: 73700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73701 - Posted: 27 Aug 2012, 3:38:33 UTC

Hi, Another one different rig same problem.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=480888200

lr_aa_bench_T0551_SAVE_ALL_OUT_IGNORE_THE_REST_52933_1642_0

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<stderr_txt>
[2012- 8-27 12:48:21:] :: BOINC:: Initializing ... ok.
[2012- 8-27 12:48:21:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_lr_aa_bench_T0551_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
======================================================
DONE :: 3 starting structures 1201 cpu seconds
This process generated 3 decoys from 3 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 2.90765952928257
Granted credit 0
application version 3.41

ID: 73701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73702 - Posted: 27 Aug 2012, 7:37:24 UTC

And another, same thing it finishing after only 5min.! why.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=480888542

lr_aa_bench_T0551_SAVE_ALL_OUT_IGNORE_THE_REST_52933_1646_0


# cpu_run_time_pref: 14400
======================================================
DONE :: 3 starting structures 1201 cpu seconds
This process generated 3 decoys from 3 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 3.4143034827933
Granted credit 0
application version 3.41

ID: 73702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 73708 - Posted: 27 Aug 2012, 19:02:26 UTC

Was there not a bug some time ago that was making work fail at 1201 seconds
Is this it back again ?
ID: 73708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 73712 - Posted: 28 Aug 2012, 13:56:08 UTC

I've been having occasional failures with some types of workunits but I don't think it's limited to 3.41. Some of these failures the wingman finishes it successfully, some both of us fail.
ID: 73712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 73714 - Posted: 28 Aug 2012, 21:30:28 UTC

I think there may be a problem with this bench
It seem to be geting stuck at 1201 seconds

Task ID 528459172
Name lr_aa_bench_T0548_SAVE_ALL_OUT_IGNORE_THE_REST_52930_1664_0
Outcome Validate error

======================================================
DONE :: 3 starting structures 1201 cpu seconds
This process generated 3 decoys from 3 attempts
======================================================
application version 3.41


ID: 73714 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nightwish
Avatar

Send message
Joined: 29 Mar 12
Posts: 10
Credit: 307,377
RAC: 0
Message 73718 - Posted: 29 Aug 2012, 4:11:42 UTC

rb_08_25_32982_63458_h001__4dtl_2012_IGNORE_THE_REST_07_15_56060_14_0
Workunit 481012768
Created 27 Aug 2012 18:03:33 UTC
Sent 27 Aug 2012 18:05:01 UTC
Received 28 Aug 2012 3:25:10 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 1530073
Report deadline 6 Sep 2012 18:05:01 UTC
CPU time 17666
stderr out
<core_client_version>7.0.25</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2012- 8-27 14: 7:30:] :: BOINC:: Initializing ... ok.
[2012- 8-27 14: 7:31:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
# cpu_run_time_pref: 28800
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007
Starting work on structure: _00008
Starting work on structure: _00009
Starting work on structure: _00010
Starting work on structure: _00011
Starting work on structure: _00012
Starting work on structure: _00013
Starting work on structure: _00014
Starting work on structure: _00015
Starting work on structure: _00016
Starting work on structure: _00017
[2012- 8-27 21:27:55:] :: BOINC:: Initializing ... ok.
[2012- 8-27 21:27:55:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
Starting work on structure: _00017
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_1 ... success!
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_2 ... success!
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_1 ... success!
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_2 ... success!
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_3 ... success!
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_4 ... success!
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_5 ... success!
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_6 ... success!
Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_7 ... success!

</stderr_txt>
]]>
Validate state Invalid
Claimed credit 84.2454349387172
Granted credit 84.2454349387172
application version 3.41

ID: 73718 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73740 - Posted: 3 Sep 2012, 8:55:04 UTC

Hi.

I've had a lot of these lr_ab_bench_ tasks fail today on both my rigs, all are finishing at 1201 sec mark.

Methinks someone needs to have a good long look at them.

ID: 73740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,845,434
RAC: 12,208
Message 73741 - Posted: 3 Sep 2012, 10:01:42 UTC
Last modified: 3 Sep 2012, 10:03:17 UTC

ID: 73741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Daniele

Send message
Joined: 19 Aug 12
Posts: 2
Credit: 9,382
RAC: 0
Message 73750 - Posted: 3 Sep 2012, 18:02:50 UTC

I think I have the same problem of Mad Max: in particular, in the following WU hyb_af_bench_3v1xA_SAVE_ALL_OUT_IGNORE_THE_REST_57036_284_0. In addition to the problems mentioned by Mad Max, I have also noticed that, during the 7 hours in which the WU has been run, no checkpoint operation was performed. I have a notebook and I can't let jobs running for such a long time without saving their work.

ID: 73750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 73753 - Posted: 3 Sep 2012, 23:52:42 UTC

I am also having the same problem as Max and Daniele.
https://boinc.bakerlab.org/rosetta/result.php?resultid=529768017
hyb_ag_bench_3rwlA_SAVE_ALL_OUT_IGNORE_THE_REST_57314_926_1
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 21600
BOINC:: CPU time: 36165.2s, 14400s + 21600s[2012- 9- 3 15:36:19:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 36166.8 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish



Your wasting my CPU time with this nonsense.
ID: 73753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 73754 - Posted: 3 Sep 2012, 23:56:27 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=529712076

lr_ab_bench_2l6fA_SAVE_ALL_OUT_IGNORE_THE_REST_58215_517_0

Validate error and you give me only 3 credits?
Come on guys?
I've been with you a long time on this project and have never gotten such a low credit for burning 885 seconds of cpu time.
and I don't get it, run preference is 21600 seconds and this task used only 885 and shut down. If it had used ALL the CPU time, based on the amount of time used for 1 decoy another 24 could have been produced.
ID: 73754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Eric Detheridge

Send message
Joined: 26 Aug 12
Posts: 2
Credit: 1,975,060
RAC: 0
Message 73756 - Posted: 4 Sep 2012, 1:07:41 UTC

Same problems as Mad Max, Daniele and Greg BE on these tasks:

529525203
529605370
529631899
529668166
529764820
529799498
529841088
529864653

cpu time ~25,000 s, claimed credit ~42, granted credit 20.00, and names start with something like hyb_ai_bench or hyb_ag_bench,

these tasks (all ir_ab_bench) failed with a validate error:

529758663
529846000
529881616

and these (also ir_ab_bench) ones ran for ~2000s and claimed (and were granted) less than 10 credits

529906748
529841036
529783345

Almost all of my tasks for Sept 03 are listed above.
ID: 73756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 2,588
Message 73759 - Posted: 4 Sep 2012, 4:05:33 UTC - in response to Message 73756.  

these tasks (all ir_ab_bench) failed with a validate error:

529758663
529846000
529881616

and these (also ir_ab_bench) ones ran for ~2000s and claimed (and were granted) less than 10 credits

529906748
529841036
529783345

Almost all of my tasks for Sept 03 are listed above.


I've been seeing a lot of problems on lr_ab_bench workunits. First, several that ran about 2000 seconds (instead of the 12 hours I selected), then gave 20 credits or less. Then, several that ran even less time, gave a validate error and no credits. Are there problems with those workunits?

For the validate errors, the wingmen gave validate errors also.
ID: 73759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 73761 - Posted: 4 Sep 2012, 7:18:00 UTC - in response to Message 73741.  
Last modified: 4 Sep 2012, 7:21:00 UTC

Mad_Max:
1. ~25k sec runtime (3 hour target time + 4 hour for watchdog activation)
2. Only 1 model(decoy)

I think they are sending out some very "complicated" WUs, which might not finish within the default 3+4 hours. This one for example needed almost 12 hours CPU time for a single decoy (and didn't checkpoint even once during that time).



Daniele:
I have a notebook and I can't let jobs running for such a long time without saving their work.

Hibernate it instead of shut down (and leave applications in memory while suspended).
.
ID: 73761 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 73762 - Posted: 4 Sep 2012, 10:17:09 UTC

and yet more silence from the team...as usual.
ID: 73762 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Daniele

Send message
Joined: 19 Aug 12
Posts: 2
Credit: 9,382
RAC: 0
Message 73765 - Posted: 4 Sep 2012, 16:29:05 UTC - in response to Message 73761.  

Link:
Hibernate it instead of shut down (and leave applications in memory while suspended).


Thanks for the advice Link. However, I wanted to point out the fact that is a little bit "strange" don't perform a checkpoint operation during over 7 or 12 hours of work. Certainly there will be a reason for this, but in my opinion would be safer doing a checkpoint operation once in awhile.
ID: 73765 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 2,588
Message 73775 - Posted: 4 Sep 2012, 21:34:59 UTC - in response to Message 73765.  

Link:
Hibernate it instead of shut down (and leave applications in memory while suspended).


Thanks for the advice Link. However, I wanted to point out the fact that is a little bit "strange" don't perform a checkpoint operation during over 7 or 12 hours of work. Certainly there will be a reason for this, but in my opinion would be safer doing a checkpoint operation once in awhile.


If there is currently checkpointing only between decoys, then any very long decoy would have to run with no checkpointing inside.
ID: 73775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73777 - Posted: 4 Sep 2012, 23:26:21 UTC

If there is currently checkpointing only between decoys, then any very long decoy would have to run with no checkpointing inside.


That's what i'm seeing on the hyb_ai_bench_ tasks, you can lose hours on some of them.

ID: 73777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,524,889
RAC: 7,500
Message 73778 - Posted: 5 Sep 2012, 9:30:03 UTC - in response to Message 73762.  

and yet more silence from the team...as usual.


Also ralph@home team has comunication problems...
ID: 73778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Mini Rosetta Version 3.41.



©2024 University of Washington
https://www.bakerlab.org