Discussion of the new credit systen (2)

Author	Message
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0	Message 26312 - Posted: 7 Sep 2006, 22:17:20 UTC Dr. = Developer ? ID: 26312 · Rating: 0 · rate: /

dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 9,592	Message 26313 - Posted: 7 Sep 2006, 22:22:43 UTC - in response to Message 26295. Last modified: 7 Sep 2006, 22:30:36 UTC I know what type of Benchmarks Power Macs can have, because I reviewed Power Macs credits under the old system. And, I know what type of credits they were getting. I know the applications do not use their potential more efficiently. The problem is, as David Kim mentioned, it's much easier for someone to tweak or change a compiler to optimise the code in a benchmark which is a very small, simple applicaiton, than it is in a complex program such as Rosetta. It might simply not be possible to do with the Rosetta code. For example, a benchmark could be made to get an incredible FPU benchmark score on the Cell (co)processor as used in the PS3, but getting the Rosetta code to run efficiently on it is another matter, partly due to its tiny cache. I'm sure if there is a compiler that the lab can use to optimise Rosetta to make better use of the PPC architecture then I'm sure they will use it. It's certainly not as straightforward as optimising a benchmark that just counts integer/fpu performance on a very limited scale (and quite possibly not very accurately). Because of this, you can't assume that because PPC based macs were getting benchmarks similar to x86 (intel/amd) chips with some optimised BOINC clients that the same is possible with the Rosetta code. If anyone is willing to try then the lab have said they'll make a version of Rosetta available for testing I believe. The other problem with PPC is that it has been discontinued (at least in macs...) so there is much less incentive to place resources on optimising the code for it. I think I do not need to recommend a compiler for the Power Mac. The fact is the developers know what is needed to be used . They have talked about the "optimizer" that could solve the problem. They know. Where have you read this? cheers Danny ID: 26313 · Rating: 0 · rate: /

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0	Message 26314 - Posted: 7 Sep 2006, 22:33:26 UTC Jose, I agree. It would be nice to optimize for Mac PPC but it is not trivial and there are no altivec people in the lab. In an ideal world, we'd have rosetta optimized for all platforms we support, at the code and compiler level. We do our best with the resources we have. For example, in our recent annual rosetta meeting we were lucky to have a breakout session where Ross Walker from the San Diego Supercomputer Center (SDSC) talked about code optimization. It was difficult enough to transition over to windows before the start of the project last year (VS2005 helped because optimization with the previous version was buggy). If Mac didn't decide to go with intel, I would have pressed harder for PPC optimization. ID: 26314 · Rating: 0 · rate: /

dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 9,592	Message 26315 - Posted: 7 Sep 2006, 22:35:42 UTC - in response to Message 26310. Last modified: 7 Sep 2006, 22:36:45 UTC To be honest with you, I don't think the Linux issue has been resolved. As long as the perception that the current credit system still under evaluate the performance under Linux persists , the issue is there . Perception is many times more powerful than reality and that is why I would like to see reality and perception to be one and the same. I wish I know how to put an end to that. That is why I would like to see a complete statistical analysis of the issue. I posted here with what I think is the info we need to be able to see what the optimal configurations are with regard to CPU and OS. It'd be useful to have an accurate list showing the performance of different configs (the main factor bening the CPU I expect). It'd be a big help to those buing new crunchers as you can then make an informed decision, for example to go for core or x2, and how worthwhile things like cache and RAM are. However, as I posted above, finding out that an OS or hardware config isn't running the code as quickly as we'd like, and making it run faster are two very different things! ID: 26315 · Rating: 0 · rate: /

Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0	Message 26316 - Posted: 7 Sep 2006, 22:46:49 UTC Last modified: 7 Sep 2006, 22:49:36 UTC Define one not too long running reference workunit, post the required parameters to run that WU on Rosetta for maybe 2 hours and ask people to send the results to you (for validation) together with the BIOS, hard- and software information you need plus the real runtime. We had that in other DC projects and many people sent results. If it is possible to force Rosetta to use a specific start value instead of the random seed, this option should of course be used. ID: 26316 · Rating: 0 · rate: /

Whl. Send message Joined: 29 Dec 05 Posts: 203 Credit: 275,802 RAC: 0	Message 26317 - Posted: 7 Sep 2006, 22:49:28 UTC - in response to Message 26301. P.S Sorry, it seems anything I say or post stirs up some poster/s here. One thing that really annoys me about your posts, is the size of those GIF files. It is a real pain in the arse scrolling all over the place to read everybody elses posts. ID: 26317 · Rating: 0 · rate: /

dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0	Message 26319 - Posted: 8 Sep 2006, 0:06:47 UTC - in response to Message 26270. Last modified: 8 Sep 2006, 0:16:30 UTC -- Deleted -- Mats already addressed the issue far better than me. ID: 26319 · Rating: 0 · rate: /

Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0	Message 26327 - Posted: 8 Sep 2006, 1:40:28 UTC - in response to Message 26310. I believe the optimization that is required and is known to solve the Mac issue should be implemented . To be honest with you, I don't think the Linux issue has been resolved. As long as the perception that the current credit system still under evaluate the performance under Linux persists , the issue is there . Perception is many times more powerful than reality and that is why I would like to see reality and perception to be one and the same. I wish I know how to put an end to that. That is why I would like to see a complete statistical analysis of the issue. Just in case: I do not do Linux ( too complicated for me ) and I do not belong to the Mac cult. :) I just don't like small sample statistics and conclusions based on small sample statistics. I don't like the use of " it seems to have been solved " in lieu of " it has been solved " . The compliance auditor that still lurks in me is trying to get answers. Alas it seems that my search for answers in an attempt to find solutions irritate some people. Worst, there are some people that do not understand why , if I left the project, I am still trying to look for answers. That is too complicated to answer here. Self-Exile , even though justified , is a weird state of being. Suffice to say I cared about this project and I do still care. That said. I think I should stop bothering this thread and let all that want and or care still look for answers and ways to make the system fair and to make the system attractive to all kind and type of crunchers. (Alas something that is not now.) keep doing their work. Pax Well, I'm by no means a good statisticians (that doesn't look right, but too tired trying to fix it), but, let's still play a little with numbers... Now, as I've already posted, if 10% is trying to cheat by artificially inflated claims, you can setup a table like this: Overclaim - increase in average granted credit per model: 5x - 40% 4x - 30% 3x - 20% 2x - 10% 1.5x - 5% 1.1x - 1% So, since Linux is just Underclaims, let's just expand this table a little. Going by BoincSynergy, there's 22632 Linux/Mac-computers in Rosetta, of total this is 12.7%. Note, no idea how many of the computers is actually active or not, but let's still use 12.7%. Underclaim - decrease in average granted credit per model: 10% - 1.27% 20% - 2.54% 30% - 3.81% 40% - 5.08% 50% - 6.35% 60% - 7.62% 70% - 8.89% 80% - 10.16% 90% - 11.43% 100% - 12.7% Meaning, even if all Linux/Mac-users claims zero credit for all their work, they'll only influence the average granted credit with 12.7%. Now, not sure how much more windows is claiming than Linux/Mac, but would guess on less than 2x, meaning the influence is less than 6.35% With some crunchers running "optimized" clients, they'll trying to increase average granted, and Linux/Mac unoptimized will try to decrease average granted. Does they cancel eachother out, possibly, but can't guarantee this. Anyway, since the new credit-system is the average of all results returned for a specific wu-type, the only real chance someone trying to get significant boost from a high claim is to be one of the 1st. to return. This in practice would mean running with 0.001 days cache-size, and 1h run-preference. A Linux/Mac-user can of course also try this, but if they're unlucky and is #1, they'll get much less credit than if they're #2 to return... In practice, appart for being the Lucky/Unlucky #1 to return, the granted credit will quickly average-away. So, in practice, there shouldn't be any significant (yes still unspecific) difference between platforms. That Mac is really slow crunching is a different problem, and isn't due to the BOINC-benchmark. But, being a little more specific at the end, remember, if all windows-users has returned all their wu, and by some unlucky strike of fate all Linux/Mac-users returns their result afterwards, the 1st. linux/mac-result will get the same granted as average for all the windows-users, while for the last linux/mac-result returned, you'll at the absolute worst get 12.7% less than the average for windows-users. But, remembering the table, this is if all linux/mac-users claimed zero credit, more realistically would expect windows is less than 2x higher benchmark, meaning the absolute worst-off is 6.35% lower for the last result. The other way around, all linux/mac returned before any windows-results, will be much worse, since the last windows-user will get roughly 2x (again not sure how much higher windows-benchmark is), but wouldn't expect it due to the users trying to get their credit-boost at the start... In any case, delaying crediting till 1000 results or something is in, should remove any large startup-spikes... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." ID: 26327 · Rating: 1 · rate: /

SekeRob Send message Joined: 7 Sep 06 Posts: 35 Credit: 19,984 RAC: 0	Message 26371 - Posted: 8 Sep 2006, 14:21:01 UTC - in response to Message 24684. Status report August, 23rd The new credit system went live. August, 24th, 11h23 UTC Currently all results returned are not granted credit but are set to "pending". This is due to the fact that the validator stopped working and has nothing to do with withholding credits for whatever reason. [edited because initial assumptions were wrong] I came specially over to crunch a few and see for myself how the new credit system works.....well first impressions are lasting impressions.....u must have nailed it right on the head.....getting credit for my Stock Machine on Stock WOS on my Stock BOINC 5.6.0 and the claim worked out 0.8% lower from what u computed the work was worth....totally aligned with the BOINC credit principles. Love it. ciao Coelum Non Animum Mutant, Qui Trans Mare Currunt ID: 26371 · Rating: 0 · rate: /

Mod.DE Volunteer moderator Send message Joined: 23 Aug 06 Posts: 78 Credit: 0 RAC: 0	Message 26376 - Posted: 8 Sep 2006, 15:20:42 UTC - in response to Message 26371. Hi Sekerob, Thanks for your nice words and encouragement. I have moved your post to the discussion thread, since the sticky thread shall be not used for dicussions. I hope you don't mind. I am a forum moderator! Am I? ID: 26376 · Rating: 0 · rate: /

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 26380 - Posted: 8 Sep 2006, 17:01:01 UTC I suppose I explain that "noticable" in my post some ten or so posts ago is equivalent to "not greatly different" or "+/- 10%". In a post in the "How much credit per hour is possible?" I showed my measurements of credit per hour per GHz as around 6.0 - 6.7 or some such. There is abour 10-12% difference between these, but that's on a relatively small set of samples, so statistically they aren't the best of numbers. I haven't got my statistics spreadsheet available here (I'm in California, not in England where my other machine happens to be), so I can't give you more detailed information at this point. But the overall general results I have seen is that (with the new credit system) the performance per core per clock-frequency is similar enough to not say that Windows or Linux is significantly different. As tralala pointed out (and I have in another post) pointed out that Linux benchmarks are quite different from the Windows ones, but the code in Rosetta is pretty similar between Linux and Windows, so the performance difference will be small. -- Mats ID: 26380 · Rating: 0 · rate: /

Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0	Message 26381 - Posted: 8 Sep 2006, 18:18:11 UTC - in response to Message 26380. I suppose I explain that "noticable" in my post some ten or so posts ago is equivalent to "not greatly different" or "+/- 10%". In a post in the "How much credit per hour is possible?" I showed my measurements of credit per hour per GHz as around 6.0 - 6.7 or some such. There is abour 10-12% difference between these, but that's on a relatively small set of samples, so statistically they aren't the best of numbers. I haven't got my statistics spreadsheet available here (I'm in California, not in England where my other machine happens to be), so I can't give you more detailed information at this point. But the overall general results I have seen is that (with the new credit system) the performance per core per clock-frequency is similar enough to not say that Windows or Linux is significantly different. As tralala pointed out (and I have in another post) pointed out that Linux benchmarks are quite different from the Windows ones, but the code in Rosetta is pretty similar between Linux and Windows, so the performance difference will be small. -- Mats 10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable. ID: 26381 · Rating: 0 · rate: /

casio7131 Send message Joined: 10 Oct 05 Posts: 35 Credit: 149,748 RAC: 0	Message 26419 - Posted: 9 Sep 2006, 4:12:53 UTC - in response to Message 26381. 10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable. i don't think the 10% that Mats is talking about is a significance level (in the sense of a statistical test), but is the difference in credit achieved between windows and linux. ID: 26419 · Rating: 0 · rate: /

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 26424 - Posted: 9 Sep 2006, 5:46:31 UTC - in response to Message 26419. 10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable. i don't think the 10% that Mats is talking about is a significance level (in the sense of a statistical test), but is the difference in credit achieved between windows and linux. There's ten percent (or so) difference between the highest average and the lowest average of my machines. If I average those numbers themselves, the spread is +/- 5% (or so). I'm currently working from memory (as described in the previous post). I have four Linux machines and two Windows machines, one of which is a laptop. None of my machines have exactly the same configuration when it comes to processor type and sockets. My fastest machine (per clockspeed) is a Linux machine, so Windows certainly doesn't get a HIGHER result. In fact, I think Windows is actually the slowest machine (but it's also a socket 754 processor, which none of the others are - but I can't say if that's part of the reason why it's lower credit, or just simply because the Windows version is slower - or just that machine isn't working quite as fast for some other reason...) -- Mats ID: 26424 · Rating: 0 · rate: /

Bad_Wolf Send message Joined: 31 Jul 06 Posts: 4 Credit: 191,553 RAC: 0	Message 29435 - Posted: 16 Oct 2006, 6:58:30 UTC Last modified: 16 Oct 2006, 7:19:23 UTC Just my 2 cents opinion: If real speed is the problem, why don't add a little 10 secs benchmark before the initialization? In this way , with the WU's result and times, will come the real base to calculate the math done and the points to give. [edit] Another way could be an average speed for every single class of CPU. For each host you have the CPU used and the BOINC benchmark result... it shouldn't be difficoult to calculate such average... [/edit] ID: 29435 · Rating: 0 · rate: /

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 29462 - Posted: 16 Oct 2006, 13:58:43 UTC - in response to Message 29435. Just my 2 cents opinion: If real speed is the problem, why don't add a little 10 secs benchmark before the initialization? In this way , with the WU's result and times, will come the real base to calculate the math done and the points to give. [edit] Another way could be an average speed for every single class of CPU. For each host you have the CPU used and the BOINC benchmark result... it shouldn't be difficoult to calculate such average... [/edit] Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower. It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time]. Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think. -- Mats ID: 29462 · Rating: 0 · rate: /

Bad_Wolf Send message Joined: 31 Jul 06 Posts: 4 Credit: 191,553 RAC: 0	Message 29473 - Posted: 16 Oct 2006, 18:46:13 UTC - in response to Message 29462. Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower. It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time]. Hosts' data have the number of CPUs installed, and having a big (because it's BIG) number of hosts in the database probably the average wouldn't be so far from reality Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think. -- Mats Maybe i didn't explain myself, sorry, english is my second language. I meant to ADD a benchmark (maybe a simple loop increasing a variable for 10 secs or less) before starting to crunch the data. BadWolf ID: 29473 · Rating: 0 · rate: /

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 29511 - Posted: 17 Oct 2006, 12:33:53 UTC - in response to Message 29473. Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower. It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time]. Hosts' data have the number of CPUs installed, and having a big (because it's BIG) number of hosts in the database probably the average wouldn't be so far from reality Yes, but each machine will have a different setup for memory and how well that memory provides data to the CPU, which is hard to measure. The CPU performance on it's own is already being measured, and that is the basis of the current score-system. There are also other factors: If the system is getting hot or low on power (in a laptop) it may reduce the speed of the processor, which means that it takes longer to do the calculation... Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think. -- Mats Maybe i didn't explain myself, sorry, english is my second language. I meant to ADD a benchmark (maybe a simple loop increasing a variable for 10 secs or less) before starting to crunch the data. BadWolf[/quote] And that's how it works today - there as benchmark to measure integer and floating point performance, and then the machine is left to do the real task of calculating Rosetta. This however has two potential problems: 1. There are different "clients" that calculate the benchmark results differently, including people who use an "optimized" client, which gives results that aren't quite comparable to the actual calculation capacity of the processor. 2. There's no measurement of the overall system performance, just a tiny benchmark (Dhrystone for integers, Whetstone for floating point) which fits nicely in the cache of just about any processor available today (anything more than about 16KB of L1 cache and it will fit in the L1 cache) - so processors with small caches get exactly the same result as those with large ones - but in reality, a large cache will be better than a small one. The current approximation, I think, although it may not be ideal, it's a close approximation of "pay for the amount of work done". -- Mats ID: 29511 · Rating: 0 · rate: /

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 29516 - Posted: 17 Oct 2006, 14:56:39 UTC - in response to Message 29511. Last modified: 17 Oct 2006, 15:08:05 UTC ... so processors with small caches get exactly the same result as those with large ones - but in reality, a large cache will be better than a small one... And don't overlook the length of the floating point pipeline. Two cpus may score the same float speed on the benchmark, but the data is predictable therefore the pipeline runs efficiently. Suppose the processorts come to 1GHz float speed (makes the sums nice), and one is a three stage pipe and the other a five stage pipe. The first cpu actually takes 3 ns to do a float, and gets the throughput by having three on the go at once. The second takes 5ns to do a float, but has 5 on the go at once. The snag comes when which number to calulate next is depends on the result of the last crunch. The first cpu's pipeline stalls for 2ns, the second for 4ns. This can also happen if the data are needed in a weird order (eg FFT tends to do better the shorter the pipe, an important point if you want to crunch on Einstein and perhaps on SETI). If I remember rightly, a Pentium M has a shorter pipe than a Pentium 4. If so, then an M will do better than a 4 at the same benchmarked float speed, and this advantage will increase the more often the floating results are used to make decisions in the code. So on two critical aspects of floating point performance, benchmarks measure what the chip can do at its best (no cache stalls, no pipe stalls). That is further than you'd hope from being a measure of what the same chip does under real conditions -- and on a project like Rosetta those real conditions may be very different beween different kinds of WU, seeing the project experoments with different stategies. It is worse still. We have issues of different pipes and caches. But then, if it is a dual core chip, do they share the cache, have their own separate caches, or what? If separate caches, how do the cache controllers deal with the case where both caches are trying to access the off-chip memeory at once? All thse variables, and we are not even starting to ask about different motherboards yet... For all these reasons benchmarks are very crude. It does seem to me that running a selection of similar tasks on a random selection of boxes taken from the real user pool is less crude, especially with a large enough sample. River~~ ID: 29516 · Rating: 0 · rate: /

Seventh Serenity Send message Joined: 30 Nov 05 Posts: 18 Credit: 87,811 RAC: 0	Message 29586 - Posted: 18 Oct 2006, 15:24:31 UTC I've just switched back to Rosetta@Home from WCG because of the unfairness with credit on Linux systems. I'm more for the science of course, but since Rosetta@Home is still partly based around the HIV/AIDS virus, I'll be running R@H until WCG get their fixed credit system in place. "In the beginning the universe was created. This made a lot of people very angry and is widely considered as a bad move." - The Hitchhiker's Guide to the Galaxy ID: 29586 · Rating: 0 · rate: /