Message boards : Number crunching : Mini Rosetta Version 3.41.
Author | Message |
---|---|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I received this error with the new app, task finished really fast for some reason. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=480803868 lr_aa_bench_2lg5A_SAVE_ALL_OUT_IGNORE_THE_REST_53023_393_0 CPU time 439.1 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> [2012- 8-27 11:29:54:] :: BOINC:: Initializing ... ok. [2012- 8-27 11:29:54:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_lr_aa_bench_2lg5A_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 ====================================================== DONE :: 13 starting structures 1201 cpu seconds This process generated 13 decoys from 13 attempts ====================================================== BOINC :: WS_max 0 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 3.41337554548822 Granted credit 0 application version 3.41 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi, Another one different rig same problem. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=480888200 lr_aa_bench_T0551_SAVE_ALL_OUT_IGNORE_THE_REST_52933_1642_0 <core_client_version>7.0.27</core_client_version> <![CDATA[ <stderr_txt> [2012- 8-27 12:48:21:] :: BOINC:: Initializing ... ok. [2012- 8-27 12:48:21:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_lr_aa_bench_T0551_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 ====================================================== DONE :: 3 starting structures 1201 cpu seconds This process generated 3 decoys from 3 attempts ====================================================== BOINC :: WS_max 0 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 2.90765952928257 Granted credit 0 application version 3.41 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
And another, same thing it finishing after only 5min.! why. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=480888542 lr_aa_bench_T0551_SAVE_ALL_OUT_IGNORE_THE_REST_52933_1646_0 # cpu_run_time_pref: 14400 ====================================================== DONE :: 3 starting structures 1201 cpu seconds This process generated 3 decoys from 3 attempts ====================================================== BOINC :: WS_max 0 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 3.4143034827933 Granted credit 0 application version 3.41 |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Was there not a bug some time ago that was making work fail at 1201 seconds Is this it back again ? |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
I think there may be a problem with this bench It seem to be geting stuck at 1201 seconds Task ID 528459172 Name lr_aa_bench_T0548_SAVE_ALL_OUT_IGNORE_THE_REST_52930_1664_0 Outcome Validate error ====================================================== DONE :: 3 starting structures 1201 cpu seconds This process generated 3 decoys from 3 attempts ====================================================== application version 3.41 |
Nightwish Send message Joined: 29 Mar 12 Posts: 10 Credit: 307,377 RAC: 0 |
rb_08_25_32982_63458_h001__4dtl_2012_IGNORE_THE_REST_07_15_56060_14_0 Workunit 481012768 Created 27 Aug 2012 18:03:33 UTC Sent 27 Aug 2012 18:05:01 UTC Received 28 Aug 2012 3:25:10 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 1530073 Report deadline 6 Sep 2012 18:05:01 UTC CPU time 17666 stderr out <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> [2012- 8-27 14: 7:30:] :: BOINC:: Initializing ... ok. [2012- 8-27 14: 7:31:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 # cpu_run_time_pref: 28800 Starting work on structure: _00002 Starting work on structure: _00003 Starting work on structure: _00004 Starting work on structure: _00005 Starting work on structure: _00006 Starting work on structure: _00007 Starting work on structure: _00008 Starting work on structure: _00009 Starting work on structure: _00010 Starting work on structure: _00011 Starting work on structure: _00012 Starting work on structure: _00013 Starting work on structure: _00014 Starting work on structure: _00015 Starting work on structure: _00016 Starting work on structure: _00017 [2012- 8-27 21:27:55:] :: BOINC:: Initializing ... ok. [2012- 8-27 21:27:55:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 28800 Starting work on structure: _00017 Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_1 ... success! Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_2 ... success! Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_1 ... success! Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_2 ... success! Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_3 ... success! Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_4 ... success! Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_5 ... success! Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_6 ... success! Continuing computation from checkpoint: chk_S_00000017_ClassicAbinitio__stage_3_iter1_7 ... success! </stderr_txt> ]]> Validate state Invalid Claimed credit 84.2454349387172 Granted credit 84.2454349387172 application version 3.41 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I've had a lot of these lr_ab_bench_ tasks fail today on both my rigs, all are finishing at 1201 sec mark. Methinks someone needs to have a good long look at them. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,844,167 RAC: 12,218 |
With 3.41 version i got lot of bugy WUs. Main "features": 1. ~25k sec runtime (3 hour target time + 4 hour for watchdog activation) 2. Only 1 model(decoy) 3. Only 20 Credit (fixed value, not calculated as usual) 4. Errors messages in WU log WARNING! cannot get file size for default.out.gz: could not open file. Links to WUs examples hyb_ag_bench_4ascA_SAVE_ALL_OUT_IGNORE_THE_REST_57379_100_0 hyb_ag_bench_3u24A_SAVE_ALL_OUT_IGNORE_THE_REST_57346_146_0 hyb_ag_bench_4adyB_SAVE_ALL_OUT_IGNORE_THE_REST_57371_117_0 hyb_ag_bench_T0543_SAVE_ALL_OUT_IGNORE_THE_REST_57433_60_0 hyb_ag_bench_3sluB_SAVE_ALL_OUT_IGNORE_THE_REST_57324_95_0 hyb_af_bench_3ur7B_SAVE_ALL_OUT_IGNORE_THE_REST_57034_201_0 hyb_af_bench_3rr5A_SAVE_ALL_OUT_IGNORE_THE_REST_57002_222_0 hyb_af_bench_T0604_SAVE_ALL_OUT_IGNORE_THE_REST_57162_37_0 hyb_ad_bench_T0591_SAVE_ALL_OUT_IGNORE_THE_REST_56600_9_0 hyb_ad_bench_3rdeD_SAVE_ALL_OUT_IGNORE_THE_REST_56445_87_0 hyb_ab_bench_3zs7A_SAVE_ALL_OUT_IGNORE_THE_REST_53952_581_0 hyb_ab_bench_3ur7B_SAVE_ALL_OUT_IGNORE_THE_REST_53942_732_0 hyb_ab_bench_2yeqB_SAVE_ALL_OUT_IGNORE_THE_REST_53890_1247_0 hyb_ab_bench_3qd9D_SAVE_ALL_OUT_IGNORE_THE_REST_53896_1104_0 |
Daniele Send message Joined: 19 Aug 12 Posts: 2 Credit: 9,382 RAC: 0 |
I think I have the same problem of Mad Max: in particular, in the following WU hyb_af_bench_3v1xA_SAVE_ALL_OUT_IGNORE_THE_REST_57036_284_0. In addition to the problems mentioned by Mad Max, I have also noticed that, during the 7 hours in which the WU has been run, no checkpoint operation was performed. I have a notebook and I can't let jobs running for such a long time without saving their work. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I am also having the same problem as Max and Daniele. https://boinc.bakerlab.org/rosetta/result.php?resultid=529768017 hyb_ag_bench_3rwlA_SAVE_ALL_OUT_IGNORE_THE_REST_57314_926_1 Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 BOINC:: CPU time: 36165.2s, 14400s + 21600s[2012- 9- 3 15:36:19:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 36166.8 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish Your wasting my CPU time with this nonsense. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=529712076 lr_ab_bench_2l6fA_SAVE_ALL_OUT_IGNORE_THE_REST_58215_517_0 Validate error and you give me only 3 credits? Come on guys? I've been with you a long time on this project and have never gotten such a low credit for burning 885 seconds of cpu time. and I don't get it, run preference is 21600 seconds and this task used only 885 and shut down. If it had used ALL the CPU time, based on the amount of time used for 1 decoy another 24 could have been produced. |
Eric Detheridge Send message Joined: 26 Aug 12 Posts: 2 Credit: 1,975,060 RAC: 0 |
Same problems as Mad Max, Daniele and Greg BE on these tasks: 529525203 529605370 529631899 529668166 529764820 529799498 529841088 529864653 cpu time ~25,000 s, claimed credit ~42, granted credit 20.00, and names start with something like hyb_ai_bench or hyb_ag_bench, these tasks (all ir_ab_bench) failed with a validate error: 529758663 529846000 529881616 and these (also ir_ab_bench) ones ran for ~2000s and claimed (and were granted) less than 10 credits 529906748 529841036 529783345 Almost all of my tasks for Sept 03 are listed above. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
these tasks (all ir_ab_bench) failed with a validate error: I've been seeing a lot of problems on lr_ab_bench workunits. First, several that ran about 2000 seconds (instead of the 12 hours I selected), then gave 20 credits or less. Then, several that ran even less time, gave a validate error and no credits. Are there problems with those workunits? For the validate errors, the wingmen gave validate errors also. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Mad_Max: 1. ~25k sec runtime (3 hour target time + 4 hour for watchdog activation) I think they are sending out some very "complicated" WUs, which might not finish within the default 3+4 hours. This one for example needed almost 12 hours CPU time for a single decoy (and didn't checkpoint even once during that time). Daniele: I have a notebook and I can't let jobs running for such a long time without saving their work. Hibernate it instead of shut down (and leave applications in memory while suspended). . |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
and yet more silence from the team...as usual. |
Daniele Send message Joined: 19 Aug 12 Posts: 2 Credit: 9,382 RAC: 0 |
Link: Hibernate it instead of shut down (and leave applications in memory while suspended). Thanks for the advice Link. However, I wanted to point out the fact that is a little bit "strange" don't perform a checkpoint operation during over 7 or 12 hours of work. Certainly there will be a reason for this, but in my opinion would be safer doing a checkpoint operation once in awhile. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
Link: If there is currently checkpointing only between decoys, then any very long decoy would have to run with no checkpointing inside. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
If there is currently checkpointing only between decoys, then any very long decoy would have to run with no checkpointing inside. That's what i'm seeing on the hyb_ai_bench_ tasks, you can lose hours on some of them. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
and yet more silence from the team...as usual. Also ralph@home team has comunication problems... |
Message boards :
Number crunching :
Mini Rosetta Version 3.41.
©2024 University of Washington
https://www.bakerlab.org