Rosetta 4.1+ and 4.2+

Message boards : Number crunching : Rosetta 4.1+ and 4.2+

To post messages, you must log in.

Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 34 · Next

AuthorMessage
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,132,983
RAC: 4,862
Message 99976 - Posted: 10 Dec 2020, 0:33:54 UTC - in response to Message 99975.  

And of course all the failed ones get resent…
I’ve just received a couple of dozen. Debating whether to abort them all


Mine are all completing just fine, so hopefully you can do them just fine too.
ID: 99976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99979 - Posted: 10 Dec 2020, 1:01:32 UTC - in response to Message 99976.  

The resends were starting to fail, so I killed all the others that were still running.
ID: 99979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 107
Credit: 822,669
RAC: 1,094
Message 99981 - Posted: 10 Dec 2020, 6:57:24 UTC - in response to Message 99967.  

Same here; several have failed with an access violation after a little over an hour.
I’ve got some more that have been running for 5 hours so far; let’s see whether they manage to complete…


Me too.
1305932126 1169355749 3551508 9 Dec 2020, 17:22:46 UTC 10 Dec 2020, 6:45:28 UTC Error while computing 3,981.64 3,606.70 --- Rosetta v4.20
windows_x86_64
1305932217 1169235239 3551508 9 Dec 2020, 17:22:46 UTC 10 Dec 2020, 5:15:44 UTC Error while computing 6,414.87 6,125.75 --- Rosetta v4.20
windows_x86_64
1305585721 1169160644 3551508 9 Dec 2020, 3:32:50 UTC 9 Dec 2020, 17:22:46 UTC Error while computing 6,736.60 6,458.57 58.00 Rosetta v4.20
windows_x86_64

S. Gaber
ID: 99981 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 107
Credit: 822,669
RAC: 1,094
Message 99982 - Posted: 10 Dec 2020, 6:57:26 UTC - in response to Message 99967.  
Last modified: 10 Dec 2020, 6:58:55 UTC

Double post.

I double click out of habit.
S. Gaber
ID: 99982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,609,136
RAC: 22,401
Message 99983 - Posted: 10 Dec 2020, 8:58:39 UTC - in response to Message 99969.  

let’s see whether they manage to complete…
They did. (Example.) The failed ones might just have been certain input values exposing a bug in an algorithm.
Probably 80% of mine so far have resulted in errors, only 20% actually completing OK.
Grant
Darwin NT
ID: 99983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,524,889
RAC: 7,500
Message 99992 - Posted: 11 Dec 2020, 9:33:42 UTC - in response to Message 99983.  

Probably 80% of mine so far have resulted in errors, only 20% actually completing OK.

Obviously these wus are NOT tested on Ralph@Home.
As usual, unfortunately.
ID: 99992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aravah

Send message
Joined: 12 Apr 20
Posts: 6
Credit: 1,101,172
RAC: 0
Message 99997 - Posted: 11 Dec 2020, 19:26:50 UTC - in response to Message 99992.  

Yes, likewise seen a lot fof ailures for 9 Dec work units for example (ComputerId 4466108) : 1169292471, 1169319345, 1169030986, 1169486445, 1169487431, 1169480600, 1169337527, 1169148343

'Hallucinated' failed with
"could not open file 00001.200.9mers."

'MOF' failed with
"File: src/utility/options/OptionCollection.cc:1398
Option matching -beta_nov15 not found in command line top-level context
Did you mean:
-corrections:beta_nov16"

'miniprotein_relax7' failed with
"process got signal 11"
(SIGSEGV - segmentation fault)
With no debug/ stack trace so perhaps this is an unhandled exception when run under Fedora.
ID: 99997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,524,889
RAC: 7,500
Message 100000 - Posted: 12 Dec 2020, 16:56:34 UTC - in response to Message 99997.  

Yes, likewise seen a lot fof ailures for 9 Dec work units....

I cannot understand why not use Ralph
They have a beta project with dedicated server and the queues are almost always empty.
Publish bugged wus on production server (Rosetta) will move away volunteers
ID: 100000 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,132,983
RAC: 4,862
Message 100008 - Posted: 13 Dec 2020, 3:10:20 UTC - in response to Message 100000.  

Yes, likewise seen a lot fof ailures for 9 Dec work units....


I cannot understand why not use Ralph
They have a beta project with dedicated server and the queues are almost always empty.
Publish bugged wus on production server (Rosetta) will move away volunteers


I wonder if they hoped they would be fine and now that they aren't they will try and fix them or send them to Ralph for further testing. One problem with testing though is the lack of people over there, so unless they have people always banging on the door asking for tasks which they should be able to monitor, they may not get the diversity of pc's needed to troubleshoot the problem on Ralph. IF that's the case then we could be stuck with them for awhile.
ID: 100008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,524,889
RAC: 7,500
Message 100009 - Posted: 13 Dec 2020, 10:03:47 UTC - in response to Message 100008.  

One problem with testing though is the lack of people over there, so unless they have people always banging on the door asking for tasks which they should be able to monitor, they may not get the diversity of pc's needed to troubleshoot the problem on Ralph.


It took years just to get the link of Ralph on Rosetta Home Page...

I partecipated to Ralph since 2008.
When they release work, it finishes in few hours
There is a lot of volunteers who want to help testing, but seems that developers are not interested.
ID: 100009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,132,983
RAC: 4,862
Message 100019 - Posted: 14 Dec 2020, 3:26:38 UTC - in response to Message 100009.  

One problem with testing though is the lack of people over there, so unless they have people always banging on the door asking for tasks which they should be able to monitor, they may not get the diversity of pc's needed to troubleshoot the problem on Ralph.


It took years just to get the link of Ralph on Rosetta Home Page...

I partecipated to Ralph since 2008.
When they release work, it finishes in few hours
There is a lot of volunteers who want to help testing, but seems that developers are not interested.


Well that answers that question then!!
ID: 100019 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bill F
Avatar

Send message
Joined: 29 Jan 08
Posts: 44
Credit: 1,561,577
RAC: 1,172
Message 100031 - Posted: 14 Dec 2020, 23:06:38 UTC - in response to Message 100009.  
Last modified: 14 Dec 2020, 23:07:22 UTC

I am one of those Ralph users that patiently wait.... Most times I miss out as other members scoop up any work super fast.

I have been a Ralph member since Jan 2018

Bill F
In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.

ID: 100031 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100046 - Posted: 15 Dec 2020, 16:51:23 UTC - in response to Message 100000.  

I cannot understand why not use Ralph
The work units in the current batch seem to be minor variations of known-good configurations, so there’s probably an assumption that they will “just work” and don’t need pre-release testing. As we have seen, though, some of the recent WUs have shown that such assumed-good inputs can still expose bugs in Rosetta.

By contrast, today’s Ralph tasks have very different command lines so are probably doing something quite new. That is the kind of change that does get tested on Ralph before being released on Rosetta.
ID: 100046 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,132,983
RAC: 4,862
Message 100048 - Posted: 16 Dec 2020, 0:52:00 UTC - in response to Message 100046.  

I cannot understand why not use Ralph
The work units in the current batch seem to be minor variations of known-good configurations, so there’s probably an assumption that they will “just work” and don’t need pre-release testing. As we have seen, though, some of the recent WUs have shown that such assumed-good inputs can still expose bugs in Rosetta.

By contrast, today’s Ralph tasks have very different command lines so are probably doing something quite new. That is the kind of change that does get tested on Ralph before being released on Rosetta.


Thanks I just got a stack of them!!
ID: 100048 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,609,136
RAC: 22,401
Message 100086 - Posted: 21 Dec 2020, 7:12:14 UTC

The horns5's are back. Compute errors galore.
Grant
Darwin NT
ID: 100086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,890,556
RAC: 2,315
Message 100187 - Posted: 27 Dec 2020, 14:27:49 UTC
Last modified: 27 Dec 2020, 14:28:19 UTC

hi, the last i've got

horns5_63667_245_945_looped21_165aa_hbnet_0001_0010_testboinc_0000800008_0000001_0_noligpoc2Ala_000000357FOL201217_BOINC_SAVE_ALL_OUT_1053176_4_1
ID: 100187 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,890,556
RAC: 2,315
Message 100203 - Posted: 27 Dec 2020, 19:07:59 UTC - in response to Message 100187.  
Last modified: 27 Dec 2020, 19:08:22 UTC

and then finished with status Succes !

Tâche 1314043584
Nom	horns5_63667_245_945_looped21_165aa_hbnet_0001_0010_testboinc_0000800008_0000001_0_noligpoc2Ala_000000357FOL201217_BOINC_SAVE_ALL_OUT_1053176_4_1
Unité de travail (WU)	1176155110
Créé	27 Dec 2020, 12:13:58 UTC
Envoyé	27 Dec 2020, 12:13:59 UTC
Date limite de rapport	30 Dec 2020, 12:13:59 UTC
Reçu	27 Dec 2020, 18:52:43 UTC
État du serveur	Sur
Résultats	Succès
État du client	Fait
État à la sortie	0 (0x00000000)
ID de l'ordinateur	3984635
Temps de fonctionnement	6 heures 30 min 29 sec
Temps de CPU	6 heures 30 min 10 sec
Valider l'état	Valide
Crédit	223.10
FLOPS maximum de l'appareil	4.11 GFLOPS
Version de l'application	Rosetta v4.20 
windows_x86_64
Peak working set size	466.11 MB
Peak swap size	442.47 MB
Peak disk usage	9.96 MB
Stderr output
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol LA_MPM_design_boinc.xml -corrections::beta_nov16 -out:suffix _BoincSeq @flag_fastdesign_boinc -script_vars LIG_ID=159 MSAcst=horns5_63667_245_945_looped21_165aa_hbnet_0001_0010_testboinc_0000800008_0000001_0_noligpoc2Ala_000000357.MSAcst -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip horns165aaFOL2012174218.zip -in:file:s horns5_63667_245_945_looped21_165aa_hbnet_0001_0010_testboinc_0000800008_0000001_0_noligpoc2Ala_000000357.pdb -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3711909
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: Assertion `active( key )` failed.
ERROR:: Exit from: C:cygwin64homeboinc4.17Rosettamainsourcesrcutility/keys/SmallKeyVector.hh line: 548
19:44:33 (452): called boinc_finish(0)

</stderr_txt>
]]>


but with an error ...
ID: 100203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,524,889
RAC: 7,500
Message 100310 - Posted: 5 Jan 2021, 7:40:40 UTC
Last modified: 5 Jan 2021, 7:49:30 UTC

All MOF_ wus:
1316797097

<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @MOF_P4132_12res_testasym_c.33.6_0001_P_41_3_2_hit_GLU_GLU_1_4_2634_cell036_ncontact09_score-10.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3306046
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00000250198BEF70

ID: 100310 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100316 - Posted: 5 Jan 2021, 19:42:21 UTC - in response to Message 100310.  
Last modified: 5 Jan 2021, 20:31:02 UTC

Same here: lots of (but not all) MOF tasks failing with an access violation within a few seconds of starting. The ones that do run finish after about 3 hours (against a default run time of 8).
ID: 100316 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,890,556
RAC: 2,315
Message 100324 - Posted: 6 Jan 2021, 13:37:54 UTC

Another one ...


Nom MOF_I213_12res_testasym_c.28.4_0001_I_21_3_hit_DASP_DASP_3_4_36_cell037_ncontact09_score032_SAVE_ALL_OUT_1056011_165_1
Unité de travail (WU) 1179445419
Créé 6 Jan 2021, 1:07:10 UTC
Envoyé 6 Jan 2021, 1:12:27 UTC
Date limite de rapport 9 Jan 2021, 1:12:27 UTC
Reçu 6 Jan 2021, 12:19:39 UTC
État du serveur Sur
Résultats Erreur de calcul
État du client Erreur de calcul
État à la sortie -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
ID de l'ordinateur 3984635
Temps de fonctionnement 22 sec
Temps de CPU 1 sec
Valider l'état Invalide


<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @MOF_I213_12res_testasym_c.28.4_0001_I_21_3_hit_DASP_DASP_3_4_36_cell037_ncontact09_score032.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3214646
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000000000000001

Engaging BOINC Windows Runtime Debugger...

ID: 100324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 34 · Next

Message boards : Number crunching : Rosetta 4.1+ and 4.2+



©2024 University of Washington
https://www.bakerlab.org