Message boards : Number crunching : Problems with Rosetta version 5.64
Author | Message |
---|---|
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Please post any issues here. In particular, let us know if you think there's a big problem with "checkpointing" or with PowerPC Macs. Thanks! |
dev Send message Joined: 1 Dec 05 Posts: 3 Credit: 6,590 RAC: 0 |
I have been unable to start any work with any PPC machine running OS 10.3 or 10.3.9 running Rosetta 5.62 and 5.64, at download they fail with a computation error and continue to download work even though they error out as soon as they are done downloading. Intel OS 10.4.x is stable and X86 Linux is stable. I am aware of the problem and furnishing this information in regards to the ongoing issue. Set up a PPC machine 10.3 & 5.64 on Ralph and awaiting new work. |
dev Send message Joined: 1 Dec 05 Posts: 3 Credit: 6,590 RAC: 0 |
Further info: Following fresh download after joining project with a G3 running OS 10.3 Log info Boinc 5.2.13 Mon May 7 15:00:02 2007|rosetta@home|Finished download of rosetta_5.64_powerpc-apple-darwin Mon May 7 15:00:02 2007|rosetta@home|Throughput 67112 bytes/sec Mon May 7 15:00:15 2007||request_reschedule_cpus: files downloaded Mon May 7 15:00:16 2007|rosetta@home|Starting result 2j03_FOLD_AND_DOCK_SYMM_RELAX_1701_1936_0 using rosetta version 564 Mon May 7 15:00:19 2007|rosetta@home|Unrecoverable error for result 2j03_FOLD_AND_DOCK_SYMM_RELAX_1701_1936_0 (process got signal 5) Mon May 7 15:00:19 2007||request_reschedule_cpus: process exited Mon May 7 15:00:19 2007|rosetta@home|Computation for result 2j03_FOLD_AND_DOCK_SYMM_RELAX_1701_1936_0 finished Mon May 7 15:00:32 2007||request_reschedule_cpus: project op |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
dev, We support OSX 10.3.9 or later versions, thus the errors on OSX10.3 are expected. I would detach your computers running 10.3 from the project or upgrade the OS. The 10.3.9 errors are from our 5.62 rosetta version which had a bug. It should be fixed now with our recent application update. |
dev Send message Joined: 1 Dec 05 Posts: 3 Credit: 6,590 RAC: 0 |
Thank you sir, I will make a note of that! |
Fivestar Crashtest Send message Joined: 12 Jul 06 Posts: 2 Credit: 141,777 RAC: 0 |
Result ID 78012509 Name 1utg__BOINC_ABRELAX_SAVE_ALL_OUT-1utg_-frags83__1705_1344_1 Workunit 70077736 Created 8 May 2007 1:14:56 UTC Sent 8 May 2007 1:15:19 UTC Received 8 May 2007 6:01:15 UTC Server state Over Outcome Client error Client state Compute error Exit status 139 (0x8b) Computer ID 487966 Report deadline 18 May 2007 1:15:19 UTC CPU time 8212.397242 stderr out <core_client_version>5.8.17</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 # random seed: 3367377 No heartbeat from core client for 31 sec - exiting SIGSEGV: segmentation violation Stack trace (13 frames): [0x8cbf0fb] [0x8cb9f2c] [0xffffe500] [0x8c2957f] [0x8b30e02] [0x8c1106f] [0x849608c] [0x80dad29] [0x85b4d1b] [0x86d8113] [0x86d81be] [0x8d22ff4] [0x8048111] Exiting... SIGSEGV: segmentation violation SIGABRT: abort called repeat sigabrt about a million times and then: SIGABRT: abort called SIGABRT: abort called </stderr_txt> ]]> Validate state Invalid Claimed credit 34.5143106775097 Granted credit 0 application version 5.64 |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
Rosetta is working, so I can't complain too loudly. But 5.64 has somehow turned my Mac (10.3.9) into a megalomaniac. Of course its claimed credits have never agreed too closely with the granted credits, but this typical example of current results is surely ridiculous?: Claimed credit 497.631627972514 Granted credit 3.98991572197545 -- R. A. Mostol |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
ramostol, your BOINC benchmarks got inflated somehow. Rerun your benchmarks and your claimed credit should come back to normal. Advanced view, then advanced pulldown menu bar, then run CPU benchmarks. Once it completes, then update to the project. If that doesn't correct it, then you should report it as a problem with your 5.8.17 beta release of BOINC. CPU reported at: Measured floating point speed 14102.56 million ops/sec Measured integer speed 95755.51 million ops/sec Rosetta Moderator: Mod.Sense |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I agree Mod. Sense. Below is data from all the powerbook 6,5's running on a small project. He could use this as a comparison. However, It does appear he should be getting more than 3 credits for two hours work. Claiming roughly 250/hour is a bit high. image |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Looks like you're right. His benchmarks are now: Measured floating point speed 582.32 million ops/sec Measured integer speed 1816.64 million ops/sec Could the lower than claimed (using peer data or his new data) be because of the mac app isn't as optimized as others?? Thereby, it just does less/hour. I seem to remember this being an issue months ago, but am to lazy to look up very old threads. |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
Thanks both of you. As you have seen I have performed the benchmarks but not compared results yet (I am offline most of the time). Of course I should have been more precise in stating that I was not surprised at the granted credits, merely at the claimed credits. (This is not a thread for discussing credits, but some results from Rosetta 5.59 might illustrate: CPU time: 14131.78 -- Granted credit: 3.84102681959318 CPU time: 8999.71 -- Granted credit: 2.41438254677557 CPU time: 10,540.56 -- Granted credit: 2.70 CPU time: 28,207.87 [1 model 7h 50 min] -- Granted credit: 7.46) For anyone wondering what may have caused these inflating benchmarks they developed after my temporarily changing the default CPU time from 2 hours to 8 hours. By the way, if the task 1bm8__BOINC_ABRELAX_SAVE_ALL_OUT-1bm8_-frags83__1705_174_0 is supposed to be checkpointing I have seen no sign of it. My computer was shut down after 41 min crunching (34% cpl.), and now the wu has just restarted from scratch with model 1 step 1. -- R. A. Mostol |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
Hm, the second running of 1bm8__BOINC_ABRELAX_SAVE_ALL_OUT-1bm8_-frags83__1705_174_0 lasted merely 1 h 25 min., generating this message: CPU time 5140.18 But since Rosetta is satisfied... -- R. A. Mostol |
McSummation Send message Joined: 30 May 06 Posts: 2 Credit: 62,822 RAC: 0 |
Please post any issues here. In particular, let us know if you think there's a big problem with "checkpointing" or with PowerPC Macs. Thanks!I'm running 5.64 on 2 machines, both with BOINC 5.8.16. On the machine running XP Home, the checkpointing is working properly. However, on the one running Win98SE, the checkpointing does not appear to work properly. When I turned the machine off last night, it was over 5 hours into a WU. This morning, it restarted that WU. |
mdettweiler Send message Joined: 15 Oct 06 Posts: 33 Credit: 2,509 RAC: 0 |
By the way, if the task 1bm8__BOINC_ABRELAX_SAVE_ALL_OUT-1bm8_-frags83__1705_174_0 is supposed to be checkpointing I have seen no sign of it. My computer was shut down after 41 min crunching (34% cpl.), and now the wu has just restarted from scratch with model 1 step 1. Ditto for me. XP Pro SP2, Intel P4 3.2Ghz HT, BOINC v5.4.11, if that helps. I've had a few similar workunits that have had problems exactly like what he's describing, with them always starting over from the beginning after I shut down and turn back on my computer. (This isn't a problem when the task is preempted, though, because I have my preferences set to "leave apps in memory while preempted"). I've set my CPU Run Time preference to 1 hour as a semi-workaround, to make the workunits easier to handle even without checkpoints (it was previously set to 10 hours), but the models in the recent workunits seem to be taking a lot longer than an hour--upwards of 2-3 hours, I've noticed. |
Andy Lee Robinson Send message Joined: 30 May 06 Posts: 1 Credit: 74,015 RAC: 0 |
I've had a couple of errors now on Linux FC6 (P4 and AMD) where the 5.64 app would just stop and appear to sleep without aborting and moving on... The WU result indicates "SIGSEGV: segmentation violation" and tries to exit. Unfortunately this means that a core on the hosts are effectively idle for a few hours until I notice and abort manually :( I think the new app still needs a little more scrutiny. Andy. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
5/10/2007 1:40:29 AM|rosetta@home|Restarting task 1cg5B_BOINC_ABRELAX_SAVE_ALL_OUT_BARCODE-1cg5B-frags83__1706_1219_1 using rosetta version 564 percentage complete went from 9% to .1% cpu time went from x number of minutes completed to 0:00 completed model reverted back to 1 and step 1 thought 5.64 was supposed to stop this from happening. did it not benchmark at 9%? using boinc 5.8.16 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
what is the minimum completion before a work unit benchmarks? |
mdettweiler Send message Joined: 15 Oct 06 Posts: 33 Credit: 2,509 RAC: 0 |
what is the minimum completion before a work unit benchmarks? I think you mean checkpoint rather than benchmark. Checkpointing is when a workunit saves its state so it can resume later; benchmarking is what your BOINC client does every week or so to see how fast your CPU is, and thus claim amounts of credit based on that (some projects will grant credit based on claimed credit, whereas some--I believe Rosetta is one--will grant credit based on less variable methods). As for your question, no, unfortunately, I don't know the answer to that. :-( That actually would be something I would like to know myself. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
yes you are correct, checkpoint is what i was meaning to ask. well if you get the answer let me know... anyone know why at 9% it reset to .2% after a reboot? what is the minimum completion before a work unit benchmarks? |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
There are three types of checkpointing. From the longest to shortest interval between checkpoints. 1. After each model is produced. This checkpointing is done for every type of job and depends on the rate of model production which depends on a number of factors like the size of the protein, the type of experiment, the computer etc. 2. During the standard relax protocol (the protein jiggles around a little in the graphics and uses full sidechains). There are a number of spots in the control flow where a checkpoint can be made as a model is being computed. This also depends on factors described above but is only available for specific types of jobs that use the relax protocol. 3. and, a more recent addition, checkpointing for pose and jumping jobs. These types of jobs should checkpoint at intervals depending on your disk write interval preference. |
Message boards :
Number crunching :
Problems with Rosetta version 5.64
©2024 University of Washington
https://www.bakerlab.org