Message boards : Number crunching : Minirosetta 3.73-3.78
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 14 · Next
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
As I understand it, the issue is that the project briefly exhausted it's supply of work available for non-Android devices. When only Android tasks are found in the scheduler, and you are coming in with a non-Android host, it reports the message "Rosetta Mini for Android is not available for your type of computer." This problem is resolved in ebbs and flows as large number of new hosts are coming in to the system. The tasks that make new work available are running continuously, but seem to hit periods of time where they still are barely keeping up with the incoming requests for work. Rosetta Moderator: Mod.Sense |
AMDave Send message Joined: 16 Dec 05 Posts: 35 Credit: 12,576,896 RAC: 0 |
Ok. I was concerned that it was a software or hardware malfunction somewhere in the pipeline (Rosetta's end or crunchers' end). How frequently is the Server Status page updated? Presently, there are 434,384 results listed as "Ready to send," and according to here, there are 147,921 Active Users. What is the default back off time for communicating with Rosetta's servers in such cases? It appears to be 24hrs. Going forward, is it possible to have some notice indicating when such an occurrence takes place (ex. Rosetta's homepage, BOINC Notices tab)? When this happens with other projects, the following lines appear in the BOINC Event Log: Sending scheduler request Requesting new tasks for CPU Scheduler request completed: no new tasks available |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,677,186 RAC: 4,532 |
Indeed, it is the 24-hour back off time that is the problem, then. Can this be adjusted? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
Could the number of tasks ready to send be divided by what host types they are suitable for, so that users can easily tell when only Android tasks are found in the scheduler? Also, could thing be adjusted so that when no tasks are available for the type of computer requesting them, but many are available for other types of computers, the delay is set much below 24 hours? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,598 RAC: 10,565 |
I think that a task can be sent to any machine and the "Android" message is a bogus message from the Rosetta servers. I think that the Android message really means that there are temporarily no tasks ready to be sent to any client. The researcher submits some text files that contain parameters and they submit the run time COMMAND LINE parameters. This information is wrapped up with the current database and passed to ANY Rosetta cruncher. Sample list of Rosetta files containing the personality data: 051207_1a19A.fasta: ASCII text 051207_1a19A.psipred_ss2: ASCII text 051207_1a19.pdb: ASCII text 051207_cc1a19A03_05.200_v1_3: ASCII text 051207_cc1a19A09_05.200_v1_3: ASCII text first couple lines of each file .... head 051207_* ==> 051207_1a19A.fasta <== >1a19A KKAVINGEQIRSISDLHQTLKKELALPEYYGENLDALWDCLTGWVEYPLVLEWRQFEQSKQLTENGAESVLQVFREAKAEGADITIILS ==> 051207_1a19A.psipred_ss2 <== # PSIPRED VFORMAT (PSIPRED V2.5) 1 K C 0.997 0.000 0.026 2 K E 0.032 0.004 0.928 3 A E 0.010 0.007 0.979 4 V E 0.005 0.009 0.950 5 I E 0.012 0.008 0.957 6 N E 0.053 0.006 0.941 7 G C 0.563 0.347 0.097 8 E H 0.346 0.641 0.060 ==> 051207_1a19.pdb <== ATOM 1 N LYS A 1 99.864 52.581 -5.099 1.00 52.69 N ATOM 2 CA LYS A 1 98.880 51.736 -5.841 1.00 51.62 C ATOM 3 C LYS A 1 97.862 51.097 -4.890 1.00 49.92 C ATOM 4 O LYS A 1 96.658 51.274 -5.048 1.00 49.38 O ATOM 5 CB LYS A 1 99.614 50.652 -6.636 1.00 52.27 C ATOM 6 CG LYS A 1 99.215 50.600 -8.104 1.00 53.15 C ATOM 7 CD LYS A 1 98.997 49.163 -8.582 1.00 52.28 C ATOM 8 CE LYS A 1 97.824 48.483 -7.860 1.00 53.06 C ATOM 9 NZ LYS A 1 96.666 48.171 -8.765 1.00 49.66 N ATOM 10 N LYS A 2 98.344 50.359 -3.898 1.00 50.17 N ==> 051207_cc1a19A03_05.200_v1_3 <== position: 1 neighbors: 200 1j8r A 68 K L -88.240 -13.689 178.802 1j8r A 69 K E -147.474 144.764 177.366 1j8r A 70 V E -141.217 139.561 179.273 1u2c A 190 K L -112.104 -21.212 176.747 1u2c A 191 K L -135.377 132.197 177.515 1u2c A 192 V L -92.662 112.751 -175.484 ==> 051207_cc1a19A09_05.200_v1_3 <== position: 1 neighbors: 200 1ikp A 13 K L -107.435 -59.471 178.170 1ikp A 14 A E -163.795 139.246 178.105 1ikp A 15 C E -147.647 156.673 175.685 1ikp A 16 V E -121.410 108.831 -175.851 1ikp A 17 L E -92.091 125.443 175.803 1ikp A 18 D E -79.170 121.221 -177.139 1ikp A 19 L L -109.923 1.413 -175.419 1ikp A 20 K L -72.749 -23.385 -175.155 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,139,251 RAC: 16,277 |
If only I was concerned about Rosetta's stability. After claiming 6 months ago I was going to stop fiddling with my PC's overclock, I've been at it again, adding a further 165MHz. I /think/ I'm stable enough to keep crunching throughout my half-week absences, but never quite know for sure until I get back home. I run AMD rather than Intel - no idea if that makes a difference.When I ran only a mild overclock I would run for months at a time without a reboot. It's only when I've gone to the most extreme levels I sometimes get lockups. |
sow-8 Send message Joined: 23 Dec 14 Posts: 2 Credit: 591,945 RAC: 0 |
I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log. I just discovered: ANDROID is a 32 bit operating system. Is it possible that Rosetta@Home cannot provide for a 32bit operating system? My computer is 32bit Linux -- I could easily load in the 64bit and try again. What does one think? :^) |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log. I've been seeing this problem under 64 bit Windows, so don't expect 64 bits alone to be a cure. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Indeed, it is the 24-hour back off time that is the problem, then. Can this be adjusted? Well, you can adjust your cache size to something like 4-6 days, than you should not run out of work during that 24 hours. . |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,677,186 RAC: 4,532 |
Indeed, it is the 24-hour back off time that is the problem, then. Can this be adjusted? I don't run out of work since I support two other BOINC projects. I run out of Rosetta tasks unless I manually update Rosetta. It's not disastrous, as Rosetta will eventually catch up; it's just inefficient and annoying. That is, once I start getting Rosetta tasks again, the other two projects' tasks get suspended while Rosetta catches up, using memory and storage for those suspended tasks. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April. Can someone have a look at my cpu info and my stats and tell me whats going on? Rosie used to be nice to me, but now she is being mean. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,598 RAC: 10,565 |
Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April. Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up. I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was running standard Rosetta and WorldGrid jobs. My Rosetta binaries generated consistent runtimes. I added PrimeGrid tasks and my Rosetta test binary execution times tripled. When I turned off PrimeGrid, my Rosetta run times return to the expected values. I am seeing similar results. Rosetta grades on a "curve" and is a "tough teacher". 8-) Results on one YOUR recent jobs shows that it got 63% or requested credit. Validate state Valid Claimed credit 168.277828702931 Granted credit 106.485820803471 application version 3.73 A recent results from MY Broadwell 8C/16T microserver: Xeon(R) CPU D-1540 @ 2.00GHz got 60% of requested credit. Validate state Valid Claimed credit 635.933767006863 Granted credit 385.737768644956 application version 3.73 A recent result from MY IvyBridge i7: i7-3770K CPU @ 3.50GHz got 40% of requested credit. Validate state Valid Claimed credit 773.073241462645 Granted credit 355.890423420893 application version 3.73 icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2725.38 icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2685.47 icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2747.27 icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2654.61 icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2828.59 icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2809.78 icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2563.33 icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2718.20 icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2725.10 icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2709.36 icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3434.77 icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3477.16 icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3519.45 icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3369.62 icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3458.30 icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 3354.62 icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 4174.92 icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 5313.84 <<< I STARTED running primegrid icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 10563.18 <<< Rosetta runtime triples! icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 11973.17 icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 9171.31 icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 8060.61 <<< I STOPPED running primegrid icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 3013.09 <<<< NORMAL runtimes icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 2881.54 icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 2883.69 icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2832.93 icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2707.10 icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2800.13 icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2970.67 icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 3067.31 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project. Since I have been on Rosetta longer than VHC, I may have to drop VHC. I was trying it because I wanted to see how virtual box worked. Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,598 RAC: 10,565 |
I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either. If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results. A quick examination of the Windows 10 task manager might tell: TASK MANAGER -> MORE DETAILS -> PROCESSES screen should tell you a lot. The CPU column should total close to 100% if you allow all CPU to be busy. SORT BY CPU by clicking on the CPU column. The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory. It is worth your time to run a couple experiments on your machine to see if anything is affecting progress.. You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%. I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either. Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up. I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was running standard Rosetta and WorldGrid jobs. My Rosetta binaries generated consistent runtimes. I added PrimeGrid tasks and my Rosetta test binary execution times tripled. When I turned off PrimeGrid, my Rosetta run times return to the expected values. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,598 RAC: 10,565 |
If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times. This is a very good indication that CPU cache thrashing (two or more jobs wanting to have their code/data in CPU caches) is a problem. Since the Boinc Whetstone benchmark ran full speed on your machine and other user machines, when Rosetta bounces between 16%-8% and cache contents are evicted, your machine is not making as much Rosetta compute progress because it is waiting for code/data to be retrieved again from slower main memory. When compared to the other machine ratios of Rosetta/Whetstone, their ratio is higher than yours appears and they are getting a higher % of claimed credits. It is hard to estimate the exact impact based on these high level numbers but if you saw 8% on Rosetta, that is not good and likely part of the problem. I have seen the GPU job load on the CPU vary as a function of the SYSTEM and as a function of the GPU, CPU and memory bandwidth. POEM is taking 100% of a CPU on my i7-3770k/Nvidia 970 GPU. The newer OpenCL GPU apps do seem to take a good chunk of a CPU. They take more CPU than their CUDA counterparts. On machines that I run POEM or similar OpenCL GPU projects, I set the : BOINC -> COMPUTER PREFERENCES -> USAGE LIMITS -> % of CPUs = 99% to keep 1 CPU available for the GPU jobs AND for reasonable response on the system. It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Never mind. I read your email to fast Mod. Thanks for the clearing out of the double post. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Ok, I will lower my overal Boinc CPU load to 98% and see if that helps. And what you see on POEM is the same with me. 100% GPU and grabbing a significant percent of CPU. So it could be like you said, Rosetta getting bounced. - Lowered both levels of processor usage to 96%. Will let things run and see if that helps Rosie catch back up. Thanks for the help. Let you know later if that solves the issue. If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
96% seems to be a sweet spot for the machine. Percentages are holding around 16% average now. No drop outs. Thanks for the help [quote]Ok, I will lower my overal Boinc CPU load to 98% and see if that helps. And what you see on POEM is the same with me. 100% GPU and grabbing a significant percent of CPU. So it could be like you said, Rosetta getting bounced. - Lowered both levels of processor usage to 96%. Will let things run and see if that helps Rosie catch back up. Thanks for the help. Let you know later if that solves the issue. [quote]If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times. This is a very good indication that CPU cache thrashing (two or more jobs wanting to have their code/data in CPU caches) is a problem. Since the Boinc Whetstone benchmark ran full speed on your machine and other user machines, when Rosetta bounces between 16%-8% and cache contents are evicted, your machine is not making as much Rosetta compute progress because it is waiting for code/data to be retrieved again from slower main memory. When compared to the other machine ratios of Rosetta/Whetstone, their ratio is higher than yours appears and they are getting a higher % of claimed credits. It is hard to estimate the exact impact based on these high level numbers but if you saw 8% on Rosetta, that is not good and likely part of the problem. I have seen the GPU job load on the CPU vary as a function of the SYSTEM and as a function of the GPU, CPU and memory bandwidth. POEM is taking 100% of a CPU on my i7-3770k/Nvidia 970 GPU. The newer OpenCL GPU apps do seem to take a good chunk of a CPU. They take more CPU than their CUDA counterparts. On machines that I run POEM or similar OpenCL GPU projects, I set the : BOINC -> COMPUTER PREFERENCES -> USAGE LIMITS -> % of CPUs = 99% to keep 1 CPU available for the GPU jobs AND for reasonable response on the system. [quote]It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%. [quote]I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either. If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results. A quick examination of the Windows 10 task manager might tell: TASK MANAGER -> MORE DETAILS -> PROCESSES screen should tell you a lot. The CPU column should total close to 100% if you allow all CPU to be busy. SORT BY CPU by clicking on the CPU column. The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory. It is worth your time to run a couple experiments on your machine to see if anything is affecting progress.. [quote]You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project. Since I have been on Rosetta longer than VHC, I may have to drop VHC. I was trying it because I wanted to see how virtual box worked. [quote][quote]Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April. Can someone have a look at my cpu info and my stats and tell me whats going on? Rosie used to be nice to me, but now s |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,598 RAC: 10,565 |
There is no difference between 99% and 96% of CPUs in the computing configuration of your machine. Any minor change was likely due to background churning of other jobs ... either normal system tasks or other Boinc compute jobs. There are two BOINC COMPUTING PREFERENCES -> COMPUTING controls for the CPU. One is "% of CPUs" which controls the number of CPUs that are active. Second is "% of CPU time" which intentionally inserts idle into the compute time. Use "% of CPUs" and AVOID the "% of CPU time" like the plague. Inserting non-BOINC time into the project execution is like what you saw with Rosetta running at 8%. Your 8% was like setting the "% of CPU time" at 50%. The "% of CPUs" deals in whole CPUs. "% of CPUs" set to 99% will allow 5 of your 6 CPU to run CPU only jobs. You can drop "% of CPUs" down to 100% - 1/6 = 83.4% and it should still allow 5 of your CPUs to run. If you set "% of CPUs" to 83%, then BOINC will idle the second CPU and only 4 would run. EXAMPLE: On my i7 with 8-CPUs, setting "% of CPUs" to 99% disables 1 CPU ... and displays the following message in the EVENT LOG: 5/10/2016 6:00:32 AM | | Number of usable CPUs has changed from 8 to 7. 5/10/2016 6:00:32 AM | | max CPUs used: 7 Setting "% of CPUs" to 88% yields the same message. Setting "% of CPUs" to 87% drops another CPU with the EVENT LOG message: 5/10/2016 6:02:32 AM | | Number of usable CPUs has changed from 7 to 6. 5/10/2016 6:02:32 AM | | max CPUs used: 6 Ok, I will lower my overal Boinc CPU load to 98% and see if that helps. Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up. I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was r |
Message boards :
Number crunching :
Minirosetta 3.73-3.78
©2024 University of Washington
https://www.bakerlab.org