Message boards : Number crunching : Rosetta 4.1+ and 4.2+
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · 33 · 34 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,615,377 RAC: 22,356 |
A whole bunch of ea00010f_nav1-7 Tasks crashed and burnt within seconds of starting. Have probably got only a 50% success rate for these Tasks at present. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,154,592 RAC: 16,098 |
A whole bunch of ea00010f_nav1-7 Tasks crashed and burnt within seconds of starting. Have probably got only a 50% success rate for these Tasks at present. Ditto Also the following crash in the same way, but after some hours running: XW_JG_11222020_A_E5_BBEAA_H15_BAAB_E5_ABBB_E5_BAB_H14_BAAB_E5_1_20201115012036_combined_0001_0001_fold_SAVE_ALL_OUT_1039435_834_0 XW_JG_11222020_A_E5_BBEAA_H16_GB_E5_AEAB_E5_BAB_H14_BAAB_E5_1_20201115003503_combined_0001_0001_fold_SAVE_ALL_OUT_1036527_874_0 XW_JG_11222020_A_E5_BBEAA_H16_GB_E5_GAAA_E5_GBB_H14_BAAB_E5_1_20201115010844_combined_0001_0001_fold_SAVE_ALL_OUT_1039302_876_0 XW_JG_11222020_A_E5_BBEAA_H16_GB_E5_EAAA_E5_BAB_H14_BAAB_E5_1_20201115005359_combined_0001_0001_fold_SAVE_ALL_OUT_1041485_880_0 XW_JG_11222020_A_E5_BBEAA_H15_BAAB_E5_GGG_E5_BAB_H14_BAAB_E5_1_20201115010538_combined_0001_0001_fold_SAVE_ALL_OUT_1043178_888_0 XW_JG_11222020_A_E5_BBEAA_H15_BAAB_E5_BABE_E5_BAB_H14_BAAB_E5_1_20201115002019_combined_0001_0001_fold_SAVE_ALL_OUT_1038891_907_0 XW_JG_11222020_A_E5_BBEAA_H15_BAAB_E5_GEG_E5_BAB_H14_BAAB_E5_1_20201113180034_combined_0001_0001_fold_SAVE_ALL_OUT_1039676_912_0 XW_JG_11222020_A_E5_BBEAA_H16_GB_E5_GBAG_E5_BAB_H14_BAAB_E5_1_20201115005214_combined_0001_0001_fold_SAVE_ALL_OUT_1038560_915_0 XW_JG_11222020_A_E5_BBEAA_H16_GB_E5_GAA_E5_GBB_H14_BAAB_E5_1_20201115003127_combined_0001_0001_fold_SAVE_ALL_OUT_1036333_916_0 XW_JG_11222020_A_E5_BBEAA_H15_BAAB_E5_EEBG_E5_BAB_H14_BAAB_E5_1_20201115002608_combined_0001_0001_fold_SAVE_ALL_OUT_1041275_918_0 XW_JG_11222020_A_E5_BBEAA_H15_BAAB_E5_EBEG_E5_BAB_H14_BAAB_E5_1_20201115011731_combined_0001_0001_fold_SAVE_ALL_OUT_1040851_919_0 XW_JG_11222020_A_E5_BBEAA_H16_BAAB_E5_GBA_E5_BAB_H14_BAAB_E5_1_20201114234748_combined_0001_0001_fold_SAVE_ALL_OUT_1038996_921_0 rb_02_01_54553_53700_ab_t000__robetta_IGNORE_THE_REST_11_18_1060010_3_0 gmcsf.culled_gmcsf060_fragments_fold_SAVE_ALL_OUT_1059765_685_0 And the following come up with the following error after some hours running [ ERROR ]: Caught exception: tslp_site2_5j13_graft_bcov_design_v1_SAVE_ALL_OUT_IGNORE_THE_REST_3rf2hv7u_1059456_1_0 tnfa_site1_5m2j_graft_bcov_design_v1_SAVE_ALL_OUT_IGNORE_THE_REST_3on2fo6d_1059217_1_0 il4r_alpha_site1_6wgl_graft_bcov_design_v1_SAVE_ALL_OUT_IGNORE_THE_REST_6dz2xf8c_1059371_1_0 il4r_alpha_site2_3bpl_graft_bcov_design_v1_SAVE_ALL_OUT_IGNORE_THE_REST_8fq4il7y_1059469_1_1 il4r_alpha_site2_6wgl_graft_bcov_design_v1_SAVE_ALL_OUT_IGNORE_THE_REST_6ar7db9w_1059455_1_0 |
Plomos Send message Joined: 4 Mar 11 Posts: 11 Credit: 439,043 RAC: 0 |
I also had at least one of these error out because of no suitable grafts https://boinc.bakerlab.org/rosetta/result.php?resultid=1332257408 |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
And the following come up with the following error after some hours running Since August... MotifGraftMover.cc:537 |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,891,405 RAC: 2,330 |
HHh00003_dummy_0001_281_abinitio_SAVE_ALL_OUT_1210531_234 Tâche cliquer pour voir les détails Ordinateur Envoyé Délai reporté ou date limite explication État Temps de fonctionnement (sec) Temps de CPU (sec) Crédit Application 1336791021 4183655 16 Feb 2021, 3:42:46 UTC 16 Feb 2021, 11:40:27 UTC Erreur en cours de calculs 11.61 3.25 --- Rosetta v4.20 windows_x86_64 1337008354 3984635 16 Feb 2021, 11:50:43 UTC 16 Feb 2021, 18:49:38 UTC Erreur en cours de calculs 9.42 2.64 --- Rosetta v4.20 windows_x86_64 HhH00003_dummy_0003_14_abinitio_SAVE_ALL_OUT_1210430_35 1336630206 6011180 15 Feb 2021, 19:48:50 UTC 16 Feb 2021, 18:54:11 UTC Erreur en cours de calculs 14.71 3.29 --- Rosetta v4.20 windows_x86_64 1337178671 5879499 16 Feb 2021, 19:00:53 UTC 17 Feb 2021, 3:02:30 UTC Terminé et validé 28,751.59 28,705.30 277.88 Rosetta v4.20 x86_64-pc-linux-gnu Errored out with Windows but not with Linux ^^ |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Errored out with Windows but not with Linux ^^ Yes, I see the same thing on Win10, but not on Ubuntu 18.04/20.04. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,073,013 RAC: 8,289 |
For the first time in a long time I’ve had two WUs error out on me :- https://boinc.bakerlab.org/rosetta/result.php?resultid=1350127477 https://boinc.bakerlab.org/rosetta/result.php?resultid=1350498193 The common part of the task name is _clean_0001_E110_Y29_S98d2bmob1_clean_0001_*.rd1.silent_nornMar01__ Although this might mean nothing as I have 7 valid WUs with the same naming. The error message is :- terminate called after throwing an instance of 'std::out_of_range' what(): map::at One wingman errored out with out of memory and the other completed and validated. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,133,189 RAC: 4,733 |
For the first time in a long time I’ve had two WUs error out on me :- I wonder if it could be the lack of onboard gpu memory? You have 972mb of onboard gpu memory on one gpu and 978mb of memory on your gpu in other pc...I wonder if it's border line now? I know Einstein is sending out their GW tasks that require a gpu with at least 4gb or they regularly fail. As each project makes new tasks I wonder if they are pushing the envelope a bit? |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,073,013 RAC: 8,289 |
For the first time in a long time I’ve had two WUs error out on me :- But surely Rosetta is CPU only? |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
The error message is :- Those will be different manifestations of the same error. Linux is reporting the actual problem; Windows has a catch-all “(C++ Exception)” message. “Out Of Memory” is a red herring: any unhandled C++ exception will yield that message. What will be happening is that the code is looking up a value in a data structure, expecting to find it – and because it doesn’t handle the case where the value is not present, it cannot continue. As Rosetta is massively data-driven it’s no surprise that this can happen on some workunits but not others (and not be caught in pre-release testing, which can only cover a small subset of the space explored after the batch of work is released to BOINC). |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,073,013 RAC: 8,289 |
The error message is :- Many thanks, nothing I can do about it so carry on as normal seems to be the message :-) |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Thank you for prompting me to investigate. I’ve had a handful of errors from those NornMar01 tasks myself, and had been mystified by the “Out of Memory” messages, as with 2 GB per core memory generally ought not to be a problem on my machines. I had just never happened upon a case where the other task had gone to a Linux host, so I didn’t see the message with the real cause until you pointed it out. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
Some of these (pre_helical_bundles_round1_attempt1): command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_0vf8tj0z.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_0vf8tj0z.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_0vf8tj0z.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2089611 |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Hmm, I wonder what this means: Stderr output <core_client_version>7.16.16</core_client_version> <![CDATA[ <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_arm-android-linux-gnu -run:protocol jd2_scripting @flags_rb_07_21_95009_92489__t000__0_C3_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_07_21_95009_92489__t000__0_C3_robetta.zip -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1258034 Extracting in project directory: database_357d5d93529_n_methyl.zip Using database: database_357d5d93529_n_methyl/minirosetta_database ERROR: Assertion `res2.is_bonded(res1)` failed. ERROR:: Exit from: src/core/scoring/etable/count_pair/CountPairFactory.cc line: 222 called boinc_finish(0) </stderr_txt> ]]> The task ran for 7 hours 31 min 15 sec on Android 11, was validated, and netted me 294.91credits (seems normal, if not a bit high. I guess the Snapdragon 888 is stupid fast if you give it enough cooling.). |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
All rb_ with errors ERROR: FuncFactory: unknown constraint function type: |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,154,592 RAC: 16,098 |
All rb_ with errors I waited a little while because I had several coming up in my cache shortly. Exactly the same outcome, all within 25 seconds of starting. Hmm... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,154,592 RAC: 16,098 |
All rb_ with errors Reported - and it hadn't been spotted on the server-side yet, so good catch. All remaining tasks aborted by the server already, I assume for correction and re-issue. Thanks Edit: I also mentioned that it wasn't the best time, because the queue of tasks is now down to about 500k so hopefully that pre-empts us running out by the end of the week. Let's see on that one |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,615,377 RAC: 22,356 |
Getting the occasional error from the latest batch of work. degrader_site_5nvx_jhr_bcov3_SAVE_ALL_OUT_IGNORE_THE_REST_1pl5px4m_1729329_1_0 degrader_site_5nvx_jhr_bcov3_SAVE_ALL_OUT_IGNORE_THE_REST_1xf6yz5e_1729329_1_0 degrader_site_5nvx_jhr_bcov3_SAVE_ALL_OUT_IGNORE_THE_REST_7ja6bs8e_1729329_1_1 All gave this error in just under 2 minutes. <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pdblite_boinc_120_10_tfirst--fuse--predictor_v13_degrader_boinc--fuse--tslp_design_v2_degrader_boinc.xml @degrader_site_5nvx_jhr_bcov_flags2 -in:file:silent degrader_site_5nvx_jhr_bcov3_SAVE_ALL_OUT_IGNORE_THE_REST_1pl5px4m.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip degrader_site_5nvx_jhr_bcov3_SAVE_ALL_OUT_IGNORE_THE_REST_1pl5px4m.zip @degrader_site_5nvx_jhr_bcov3_SAVE_ALL_OUT_IGNORE_THE_REST_1pl5px4m.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3758214 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: Error in core::conformation::Conformation::residue(): The sequence position requested was greater than the number of residues in the pose. ERROR:: Exit from: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/conformation/Conformation.hh line: 508 BOINC:: Error reading and gzipping output datafile: default.out 12:00:41 (7756): called boinc_finish(1) </stderr_txt> ]]> And in just under 30min this one errored out (according to Stderr output), but Validated degrader_site_3mup_plait_-2.5_bcov_30.hbnet_3_SAVE_ALL_OUT_IGNORE_THE_REST_8yc7vq7n_1729332_1_0 <core_client_version>7.16.11</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pdblite_boinc_120_10_tfirst--fuse--predictor_v13_degrader_plait_boinc--fuse--tslp_design_v2_degrader_plait_boinc.xml @degrader_site_3mup_plait_-2.5_bcov_flags2 -in:file:silent degrader_site_3mup_plait_-2.5_bcov_30.hbnet_3_SAVE_ALL_OUT_IGNORE_THE_REST_8yc7vq7n.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip degrader_site_3mup_plait_-2.5_bcov_30.hbnet_3_SAVE_ALL_OUT_IGNORE_THE_REST_8yc7vq7n.zip @degrader_site_3mup_plait_-2.5_bcov_30.hbnet_3_SAVE_ALL_OUT_IGNORE_THE_REST_8yc7vq7n.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3745924 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: Called Constraint::remap_resid method from derived class UNKNOWN_TYPE,ended up in Constraint::remap_resid ERROR:: Exit from: ......srccorescoringconstraintsConstraint.cc line: 188 12:27:28 (7040): called boinc_finish(0) </stderr_txt> ]]> But most seem to be running to the Target CPU time & Validating OK. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,615,377 RAC: 22,356 |
Those _5nvx_ degrader Tasks continue to produce nothing but errors. At least they're quick to die. Grant Darwin NT |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
Those _5nvx_ degrader Tasks continue to produce nothing but errors. This error was reported during test in Ralph. I cannot understand why put in production these wus |
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
©2024 University of Washington
https://www.bakerlab.org