MiniRosetta 3.17 Problems.

Message boards : Number crunching : MiniRosetta 3.17 Problems.

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 71507 - Posted: 27 Oct 2011, 7:45:54 UTC
Last modified: 27 Oct 2011, 7:52:04 UTC

Hi.

I've had two different types of tasks error, the same types have been run before on this rig with 3.14 app and not erred.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=418800096

place_CE_20110919_EBOV_GP_2d1v_ProteinInterfaceDesign_31440_359_0

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>

ERROR: drSOP
ERROR:: Exit from: src/protocols/protein_interface_design/movers/PlaceStubMover.cc line: 1063
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
=================================================================================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=418800129

3filtr5A_CYpa_2aak_ProteinInterfaceDesign_23Aug2011_30588_1098_0

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>

ERROR: drSOP
ERROR:: Exit from: src/protocols/protein_interface_design/movers/PlaceStubMover.cc line: 1063
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
ID: 71507 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cmiles

Send message
Joined: 4 Jan 11
Posts: 9
Credit: 0
RAC: 0
Message 71510 - Posted: 27 Oct 2011, 15:02:31 UTC

Thank you for reporting this problem. We're taking a look at it now.
ID: 71510 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shawn
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 22 Jan 10
Posts: 17
Credit: 53,741
RAC: 0
Message 71513 - Posted: 27 Oct 2011, 18:55:19 UTC

Thanks for letting us know.

As you are probably aware, we recently changed our version of Rosetta@home. These current jobs are associated with protocols written for an older version. I did not notice any compatibility problems at the time, but I will do some more testing on these jobs to find out why they didn't work.
ID: 71513 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shawn
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 22 Jan 10
Posts: 17
Credit: 53,741
RAC: 0
Message 71516 - Posted: 27 Oct 2011, 22:45:33 UTC - in response to Message 71513.  

Thanks for letting us know.

As you are probably aware, we recently changed our version of Rosetta@home. These current jobs are associated with protocols written for an older version. I did not notice any compatibility problems at the time, but I will do some more testing on these jobs to find out why they didn't work.


I think we've identified the problem, and the ProteinInterfaceDesign team is now aware of the issue. Thanks once again for your time, your computational resources, and your feedback!
ID: 71516 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 71517 - Posted: 28 Oct 2011, 4:07:32 UTC
Last modified: 28 Oct 2011, 4:08:57 UTC

See, if you guys would post a summary of this problem on the front page... it'd have a profound effect on users. They'd see that the rosetta team is working... etc. Same goes when the server goes down. Say: "Hey, someone unplugged the servers during last night's party. We'll get that fixed as soon a possible." Or something along those line would be great for people trying to know what's going on. Just my humble advice.
ID: 71517 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
pieface

Send message
Joined: 20 Sep 05
Posts: 17
Credit: 797,661
RAC: 0
Message 71522 - Posted: 28 Oct 2011, 15:16:00 UTC
Last modified: 28 Oct 2011, 15:21:02 UTC

I really don't mind the small things like the DrSOP problem, they tie up some resources for download then upload, but I don't get charged extra for that. But, during the same timeframe I also had something like a dozen ProteinInterfaceDesign and Ploop2x3 run to their full allotted time (6hrs or so depending on how watchdog was feeling) and then when the validator finally got caught-up they were marked as invalid. I had some of these on both machines I had crunching Rosetta - one is a Win XP X64 system and the other a Win7 box, no overclocking at all. Here are a couple of examples - any ideas or anyone else get those kind of results in this last batch?

Ploop2x3
Ploop2x3
PID

note: edited to take out 'over the weekend'.
ID: 71522 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 71524 - Posted: 28 Oct 2011, 19:24:02 UTC

ID: 71524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cmiles

Send message
Joined: 4 Jan 11
Posts: 9
Credit: 0
RAC: 0
Message 71525 - Posted: 28 Oct 2011, 19:34:16 UTC

@pieface, @clive_G1FYE
The errors you're both experiencing are due to a name change in a protocol commonly used by protein designers. These same work units executed without error on the previous version of Rosetta@Home. However, the name change was not done in a backwards-compatible way. We're putting a system in place to prevent these kinds of errors from happening in the future. Thank you both for reporting these problems.
ID: 71525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 2
Message 71533 - Posted: 29 Oct 2011, 21:35:06 UTC

lots of errors, stop downloading units

https://boinc.bakerlab.org/rosetta/result.php?resultid=459660390
https://boinc.bakerlab.org/rosetta/result.php?resultid=459660074
https://boinc.bakerlab.org/rosetta/result.php?resultid=459660070
https://boinc.bakerlab.org/rosetta/result.php?resultid=459635613
https://boinc.bakerlab.org/rosetta/result.php?resultid=459658860
ID: 71533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 2
Message 71534 - Posted: 29 Oct 2011, 21:45:03 UTC
Last modified: 29 Oct 2011, 21:57:27 UTC

More info

T0....units seem ok

ab_07_19... crashing all

2stubs... crash

place_CE_... crash

rlx_jsr... OK
ID: 71534 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,844,541
RAC: 12,207
Message 71539 - Posted: 30 Oct 2011, 17:07:14 UTC
Last modified: 30 Oct 2011, 17:12:44 UTC

ID: 71539 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 71540 - Posted: 30 Oct 2011, 20:13:49 UTC

All my WU's error out very soon, I got these error messages:

ERROR: [ERROR] invalid header input for kill_hairpins file.
ERROR:: Exit from: ......srccorescoringSS_Killhairpins_Info.cc line: 370
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


Greetings,
TJ.
ID: 71540 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 71543 - Posted: 30 Oct 2011, 22:55:15 UTC

Yup, I got some dead hairpin file`s as well in the ab_07_19_ series
The things you have to do to a protein to make them behave :¬)

https://boinc.bakerlab.org/rosetta/result.php?resultid=459619880

https://boinc.bakerlab.org/rosetta/result.php?resultid=459639244
ID: 71543 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 71549 - Posted: 31 Oct 2011, 4:14:46 UTC

Some more errors, different type of tasks others i've had have been running o.k. apart from these.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=401795240

ab_07_19_1fnaA_filtnr_IGNORE_THE_REST_06_08_28682_52_1

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>

Starting work on structure: _00001

ERROR: [ERROR] invalid header input for kill_hairpins file.
ERROR:: Exit from: src/core/scoring/SS_Killhairpins_Info.cc line: 370
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
Watchdog active.

</stderr_txt>

==================================================================================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=401801710

ab_07_19_1acfA_control_IGNORE_THE_REST_03_07_28679_51_0

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>

Starting work on structure: _00001

ERROR: [ERROR] invalid header input for kill_hairpins file.
ERROR:: Exit from: src/core/scoring/SS_Killhairpins_Info.cc line: 370
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>




ID: 71549 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cmiles

Send message
Joined: 4 Jan 11
Posts: 9
Credit: 0
RAC: 0
Message 71553 - Posted: 31 Oct 2011, 16:55:51 UTC

The offending jobs have been removed.

Rosetta is a large and diverse project. Unlike more focused efforts such as SETI@Home, the breadth of compute tasks being performed on Rosetta@Home is incredible. While offering enormous flexibility, this greatly complicates testing and validation. Unfortunately, some bad jobs slipped in this time. In many cases, Rosetta@Home users such as myself find out about failing jobs when you do, and we're just as frustrated when such jobs are distributed.

Thank you for your continued support.
ID: 71553 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,070,625
RAC: 2,159
Message 71554 - Posted: 31 Oct 2011, 17:38:08 UTC - in response to Message 71553.  

The offending jobs have been removed.

Rosetta is a large and diverse project. Unlike more focused efforts such as SETI@Home, the breadth of compute tasks being performed on Rosetta@Home is incredible. While offering enormous flexibility, this greatly complicates testing and validation. Unfortunately, some bad jobs slipped in this time. In many cases, Rosetta@Home users such as myself find out about failing jobs when you do, and we're just as frustrated when such jobs are distributed.

Thank you for your continued support.
But why the *snap* can no sysadmin post some proper info about this in a timely fashion?

It's just a matter of simple communication, doesn't even cost much time. :-(

Ralf

ID: 71554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 71555 - Posted: 31 Oct 2011, 19:19:01 UTC - in response to Message 71553.  

The offending jobs have been removed.

Rosetta is a large and diverse project. Unlike more focused efforts such as SETI@Home, the breadth of compute tasks being performed on Rosetta@Home is incredible. While offering enormous flexibility, this greatly complicates testing and validation. Unfortunately, some bad jobs slipped in this time. In many cases, Rosetta@Home users such as myself find out about failing jobs when you do, and we're just as frustrated when such jobs are distributed.

Thank you for your continued support.


Why isn't ralph being used to catch these errors? All workunits I've received from ralph recently have been using app version 3.14.


Best,
Snags
ID: 71555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,070,625
RAC: 2,159
Message 71556 - Posted: 31 Oct 2011, 19:23:04 UTC - in response to Message 71555.  

Why isn't ralph being used to catch these errors? All workunits I've received from ralph recently have been using app version 3.14.
Yeah, what RALPH@Home is doing is a bit odd recently. Several times, I got swamped with sets of 20 WUs at a time, and a mix of applications labeled both as "Rosetta Mini Beta 3.17" (currently 2 awaiting their turn) and as "Rosetta Mini 3.14" (another 20 WUs piled up to be eventually being processed).

Ralf
ID: 71556 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cmiles

Send message
Joined: 4 Jan 11
Posts: 9
Credit: 0
RAC: 0
Message 71558 - Posted: 31 Oct 2011, 20:19:58 UTC

RALPH has separate executables for minirosetta (current version of Rosetta@Home) and minirosetta_beta (next version of Rosetta@Home). At the moment, the two applications are identical, despite their different version numbers.

minirosetta => 3.18
minirosetta_beta => 3.17

During the update process, the two versions will diverge. The idea behind this is to always have a running version of the software currently deployed on Rosetta@Home available for test.
ID: 71558 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,070,625
RAC: 2,159
Message 71559 - Posted: 31 Oct 2011, 20:53:31 UTC - in response to Message 71558.  

RALPH has separate executables for minirosetta (current version of Rosetta@Home) and minirosetta_beta (next version of Rosetta@Home). At the moment, the two applications are identical, despite their different version numbers.

minirosetta => 3.18
minirosetta_beta => 3.17

During the update process, the two versions will diverge. The idea behind this is to always have a running version of the software currently deployed on Rosetta@Home available for test.
And are you sure that everyone's on the same page here? :?

Ralf
ID: 71559 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : MiniRosetta 3.17 Problems.



©2024 University of Washington
https://www.bakerlab.org