Message boards : Number crunching : minirosetta 2.15
Previous · 1 · 2 · 3
Author | Message |
---|---|
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
After David Kim and TJ looked into this, we did find a problem with large memory usage with the 2.15 version. I'll do a revert first thing tomorrow. (Too tired to get it started now :p) |
deesy58 Send message Joined: 20 Apr 10 Posts: 75 Credit: 193,831 RAC: 0 |
Greetings: I think you might have "nearly a hundred gigabytes" of hard disk space capacity, but I seriously doubt that you have that much RAM memory (Random Access Memory). Virtual memory is a combination of both types of memory, and a shortage of virtual memory is often an indication that your hard disk has become nearly full. How much disk space is available on your machine? To find out, open your "Computer" icon, then right click on "Local Disk (C:)" and select "Properties." This should tell you how much space is available on your hard disk. If you have insufficient space left, you can use the "Disk Cleanup" utility (carefully) to remove files that might no longer be needed. All of this assumes that you are using the Microsoft Windows Operating System, of course, and it appears that you are. BTW, those "reverse slashes" you referred to in your original post are, indeed, normal in the Microsoft environment. deesy |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
After David Kim and TJ looked into this, we did find a problem with large memory usage with the 2.15 version. I'll do a revert first thing tomorrow. (Too tired to get it started now :p) take a look at this thread when looking at the ram issues. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one failed after 15sec. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=337685670 rb_10_04_377_958_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22344_973_1 Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/rb_10_04_377_958_rs_stg0_lrlxjcst_t000__casp9.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: Error in traceback: pointer doesn't go anywhere! ERROR:: Exit from: src/core/sequence/Aligner.cc line: 79 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Here is a whole batch of errors. Pretty sure one of these errors caused a Windows BSOD. Wingman also died on these tasks. No fancy URL links, just raw data T0605_t2_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22177_298_1 https://boinc.bakerlab.org/rosetta/result.php?resultid=367677827 Incorrect function. (0x1) - exit code 1 (0x1) ERROR: Error in traceback: pointer doesn't go anywhere! ERROR:: Exit from: ....srccoresequenceAligner.cc line: 79 BOINC:: Error reading and gzipping output datafile: default.out fix_disulf_v4_NMR_1eig_CONTROL__BOINC_abrelax.score12.fastrelax.v2_SAVE_ALL_OUT_22291_788_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=367990336 Incorrect function. (0x1) - exit code 1 (0x1) ERROR: rsd_type_list.size() ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish fix_disulf_v4_NMR_1m12_CONTROL__BOINC_abrelax.score12.fastrelax.v2_SAVE_ALL_OUT_22291_788_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=367990356 Incorrect function. (0x1) - exit code 1 (0x1) ERROR: rsd_type_list.size() ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish mem_abinitio_bench_run01_A_BRD7_SAVE_ALL_OUT_IGNORE_THE_REST_22294_290_1 https://boinc.bakerlab.org/rosetta/result.php?resultid=367990547 Incorrect function. (0x1) - exit code 1 (0x1) ERROR: Cannot open PDB file "input_BRD7BRD4.pdb" ERROR:: Exit from: ....srccoreiopdbpose_io.cc line: 182 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish fix_disulf_v4_NMR_1xu6_DISULF__BOINC_abrelax.score12.fastrelax.v2_SAVE_ALL_OUT_22292_695_1 https://boinc.bakerlab.org/rosetta/result.php?resultid=367990548 Incorrect function. (0x1) - exit code 1 (0x1) ERROR: rsd_type_list.size() ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish Ross2X3_SAVE_ALL_OUT_r006_010_22296_138_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=368110199 - exit code -1073741819 (0xc0000005) Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004CF209 read attempt to address 0x20EEE6E5 Engaging BOINC Windows Runtime Debugger.. T0591_t3_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22223_1177_1 https://boinc.bakerlab.org/rosetta/result.php?resultid=368248774 The system cannot find the path specified. (0x3) - exit code 3 (0x3) Couple of No heartbeat from client errors messages That's a pretty big laundry list for 1 or 2 days. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,845,968 RAC: 12,089 |
A have few Runtime errors and crashes to destop from minirosetta_2.15_windows_intelx86.exe last days (never seen them before on previous versions of minirosetta, only "standart" errors) and even 1 BSOD too (I had forgotten BSODs since the transition from windows 98 to XP). On this computer: https://boinc.bakerlab.org/rosetta/results.php?hostid=1252064 Not sure what concrete job caused a BSOD, there are a whole bundle of bads. Some of them with a runtime error, one with BSOD and a few were killed by BOINC for exceeding the limit of memory (after quickly after start grew up to ~ 1 GB) Like this: 08/10/2010 23:58:52 rosetta@home Aborting task rs_stg0_lrlx_t363__run1_SAVE_ALL_OUT_19372_805_0: exceeded memory limit 1353.20MB > 1223.80MB 09/10/2010 05:51:57 rosetta@home Aborting task lr5_combined_torsion_it01_run01_A_rlbd_256b_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2770_1: exceeded memory limit 1291.75MB > 1223.80MB 09/10/2010 05:53:25 rosetta@home Aborting task rs_stg0_lrlx_t311__run1_SAVE_ALL_OUT_19356_6624_1: exceeded memory limit 1292.80MB > 1223.80MB 09/10/2010 05:55:04 rosetta@home Aborting task lr5_combined_torsion_it01_run01_A_rlbd_1eyv_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2636_1: exceeded memory limit 1269.07MB > 1223.80MB P.S. 2.15 is the most problematic and buggy version of all that I've seen (since the connection to the project at start of this year inc. 5.98 2.03 2.05 2.10 2.11 2.14) On the forum of my team, just a lot of complaints about this version from other members too. |
Brian Priebe Send message Joined: 27 Nov 09 Posts: 16 Credit: 33,020,247 RAC: 0 |
2.15 is the most problematic and buggy version of all that I've seen Out of my most recent 100 WU's, 22% of them blew up on one or another of the errors already posted here. The "Unusual Termination" dialog box from MSVC seems to be becoming more frequent. |
Jim Martin Send message Joined: 9 Oct 05 Posts: 23 Credit: 1,416,797 RAC: 925 |
Two enclosures: 1) system boot info. 2) error report. * * * 10/8/2010 11:29:55 PM Starting BOINC client version 6.10.58 for windows_intelx86 10/8/2010 11:29:55 PM log flags: file_xfer, sched_ops, task 10/8/2010 11:29:55 PM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3 10/8/2010 11:29:55 PM Data directory: C:ProgramDataBOINC 10/8/2010 11:29:55 PM Running under account James 10/8/2010 11:29:56 PM Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz [Family 6 Model 15 Stepping 6] 10/8/2010 11:29:56 PM Processor: 4.00 MB cache 10/8/2010 11:29:56 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 nx lm vmx tm2 pbe 10/8/2010 11:29:56 PM OS: Microsoft Windows Vista: Business x86 Edition, Service Pack 2, (06.00.6002.00) 10/8/2010 11:29:56 PM Memory: 2.00 GB physical, 4.23 GB virtual 10/8/2010 11:29:56 PM Disk: 142.71 GB total, 89.40 GB free 10/8/2010 11:29:56 PM Local time is UTC -4 hours 10/8/2010 11:29:56 PM No usable GPUs found 10/8/2010 11:29:57 PM rosetta@home URL https://boinc.bakerlab.org/rosetta/; Computer ID 1324493; resource share 100 10/8/2010 11:29:57 PM climateprediction.net URL http://climateprediction.net/; Computer ID 819110; resource share 100 10/8/2010 11:29:57 PM Einstein@Home URL http://einstein.phys.uwm.edu/; Computer ID 1616831; resource share 50 10/8/2010 11:29:57 PM lhcathome URL http://lhcathome.cern.ch/lhcathome/; Computer ID 9825728; resource share 100 10/8/2010 11:29:57 PM Quake-Catcher Network URL http://qcn.stanford.edu/sensor/; Computer ID 9909; resource share 100 10/8/2010 11:29:57 PM SETI@home URL http://setiathome.berkeley.edu/; Computer ID 5490317; resource share 50 10/8/2010 11:29:57 PM Einstein@Home General prefs: from Einstein@Home (last modified 08-Jul-2010 11:24:21) 10/8/2010 11:29:57 PM Einstein@Home Computer location: home 10/8/2010 11:29:57 PM General prefs: using separate prefs for home 10/8/2010 11:29:57 PM Preferences: 10/8/2010 11:29:57 PM max memory usage when active: 1022.66MB 10/8/2010 11:29:57 PM max memory usage when idle: 1840.78MB 10/8/2010 11:30:10 PM max disk usage: 50.00GB 10/8/2010 11:30:10 PM don't use GPU while active 10/8/2010 11:30:10 PM (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) 10/8/2010 11:30:10 PM Not using a proxy 10/8/2010 11:30:10 PM Quake-Catcher Network Restarting task qcnk_sc300_sta200_087854_0 using qcnsensor version 562 10/8/2010 11:30:40 PM rosetta@home Restarting task rs_stg0_lrlx_t363__run1_SAVE_ALL_OUT_19372_6575_0 using minirosetta version 215 10/8/2010 11:30:40 PM climateprediction.net Sending scheduler request: To send trickle-up message. 10/8/2010 11:30:40 PM climateprediction.net Not reporting or requesting tasks 10/8/2010 11:30:44 PM climateprediction.net Scheduler request completed 10/8/2010 11:30:44 PM climateprediction.net Message from server: Project is temporarily shut down for maintenance 10/8/2010 11:32:27 PM rosetta@home Restarting task mem_widd_run02_Mevn_A_2ksy_SAVE_ALL_OUT_IGNORE_THE_REST_22157_49662_0 using minirosetta version 215 * * * BoincLogX - History nr / date error_txt project_name domain_name user_total_credit user_expavg_credit CPU error 1596 2010.10.08 13:10:49 rosetta@home james-pc 183398.626579 169.348558 00:00:59 true 1st Entry: result_name rs_stg0_lrlx_t311_run1_SAVE_ALL_OUT_19356_2623_0_0 error_txt The system cannot find the path specified. (0x3) - exit code 3 (0x3);[2010-10-8 12:32:52]::BOINC::Initializing ... ok. [2010-10-8 12:32-52] * * * * * User note: The above error/WU failure appeared to be generated, immediately after an outputted window, "Microsoft Visual C++ Runtime Library", was exited. Prior to deleating it, the system ran slower than normal. Also, after exiting, normal system speed resumed. The failures occurred on three occasions, although only one is listed, above. BoincLogX information was entered, by hand, by this user. Also, approx. 1.29 GB mem was accessed, max. * * * * * 2nd entry, with the following: Rosetta Mini 2.15 Mem_widd_run02_Mevn_A_2key_SAVE_ALL_OUT_IGNORE_THE_REST_22157_49662_0. Used phys mem was in approx. 51%-67% range. No other WU's were run, except QCN@home. After approx. 23+% run-time, with mem. usage varying from an average of 465 MB, to a max of 1.29 GB, a second Rosetta Mini 2.15 WU enabled, even though it had been previously been placed in a "suspended" state: rs_sth0_lrlx_t363_run1_SAVE_ALL_OUT_19372_6575_0. It resulted in Used phys mem to increase to 99% range, with mem usage, by ...Mem_widd..., about 1.934GB, and approx. 393.7MB, for rs_sth0... The system, temporarily, locked up (cursor movement "froze"), until ...Mem_widd... was halted, by the pgrm., and placed in a "Waiting for memory" mode. * * * * * Summary: 1) Rosetta Mini 2.15 WU memory requirements appear to necessitate restricting all other pgrms. from running, to avoid maxing out memory. 2) Successful WU run not possible, because of the entry of a 2nd Rosetta WU (reason unk.). Conclusion: 1) If possible, enable the user the option of allocating memory for Rosetta (ref. Garli, Lattice project). This might free up memory for other projects, for those with multiple CPU's (my system has two); Rosetta@home might lose some users, if this issue cannot be resolved. I had to drop Garli, for this reason. 2) Again, if possible, enable the program to function within the allocated memory. 3) Determine the cause of the activation, from "suspended" state, of another Rosetta WU (If it had not activated, the original WU pbly. would have successfully completed.). * * * Pardon the verbosity; hopefully, it will prove enlightening. JM |
cleaner Send message Joined: 22 Aug 10 Posts: 6 Credit: 26,245 RAC: 0 |
I would have to agree with Mad Max about the buginess of the 2.15 Rosetta.Before it was mainly lack of memory issues, but now i am starting to get Runtime errors. Hopefully they will soon come out with an updated version, or else revert back to an earlier, more stable version, because this is getting to be a little ridiculus to me. |
bparker Send message Joined: 9 May 07 Posts: 1 Credit: 1,215,599 RAC: 4,403 |
I would have to agree with Mad Max about the buginess of the 2.15 Rosetta.Before it was mainly lack of memory issues, but now i am starting to get Runtime errors. Hopefully they will soon come out with an updated version, or else revert back to an earlier, more stable version, because this is getting to be a little ridiculus to me. I abandoned Rosetta 2.15 until the bugs are fixed and the version changes. All my other BOINC apps work fine except for this one, and it eventually locks up the computer if I leave it running long enough without rebooting. My Window 7 machine works it without problem. My XP machine chokes on it. |
Aidan & Liz Hopkins Send message Joined: 8 Jan 07 Posts: 1 Credit: 525,167 RAC: 0 |
I keep getting the following: 09/10/2010 20:43:39|rosetta@home|Task lr5_combined_torsion_it01_run01_A_rlbd_2hkv_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2172_1 exited with zero status but no 'finished' file 09/10/2010 20:43:39|rosetta@home|If this happens repeatedly you may need to reset the project. It appears to be linked to a C++ Runtime Library error message, which has inconveniently vanished. After it had kept happening all afternoon I did 'reset', but it is still happening. Original FAQ instructions were to ignore any "exited with zero status but no 'finished' file" situation, but that was a long time ago, and this may be a different problem. It appears to be doing its regular contact with the server, and nothing is being sent or received at present. Do I just ignore it and assume the automated updates will resolve it? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,159,202 RAC: 15,498 |
With all the other reports of problems I thought I'd check my own tasks: W7-64bit Intel Core2Duo laptop 4Gb RAM - 1 error out of 36 tasks CURATED_NMR_1k7b_CONTROL__BOINC_abrelax.score12.fastrelax.v4_SAVE_ALL_OUT_22308_425_0 <core_client_version>6.10.58</core_client_version> Vista64 AMD Phenom 9850 Quad Desktop 8Gb RAM - 2 errors out of 86 tasks CURATED_NMR_1k7b_disulf__BOINC_abrelax.score12.fastrelax.v4_SAVE_ALL_OUT_22309_663_0 Errors exactly as above T0591_t3_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22223_1049_1 <core_client_version>6.10.58</core_client_version> Not too terrible, but I do have a decent amount of RAM to play with on each machine - possibly makes the difference. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,159,202 RAC: 15,498 |
Spoke too soon: lr5_combined_torsion_it01_run01_A_rlbd_1unp_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2260_0 lr5_combined_torsion_it01_run01_A_rlbd_1e6i_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2721_0 rs_stg0_lrlx_t363__run1_SAVE_ALL_OUT_19372_6878_0 All report the same: <core_client_version>6.10.58</core_client_version> Out of memory on a machine with 8Gb RAM? I doubt it. One other: rb_10_04_377_958_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22344_1087_1 <core_client_version>6.10.58</core_client_version> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
ehhh...nuts to 2.15 aborting them and moving to 2.16 every few tasks come up with ERROR: Error in traceback: pointer doesn't go anywhere! ERROR:: Exit from: ....srccoresequenceAligner.cc line: 79 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish or some other rubish |
wolfpat Send message Joined: 1 May 10 Posts: 4 Credit: 2,367,160 RAC: 1,474 |
I also had to abort all the 2.15 tasks. They would not run at all on my XP machine. The 2.16 tasks are running fine. |
Message boards :
Number crunching :
minirosetta 2.15
©2024 University of Washington
https://www.bakerlab.org