Message boards : Rosetta@home Science : Suggestions for improvement thread
Author | Message |
---|---|
Admin Send message Joined: 13 Apr 07 Posts: 42 Credit: 260,782 RAC: 0 |
Ive seen many areas where people have suggestions for R@H's improvement. I thought it would be best if we could have a central area to offer suggestions about anything related to Rosetta. If possible we could users could offer suggestions and we could get updates from staff as to feasibility and granted they are, when we will see them in action :) Some of my suggestions: 1. Put Rosetta@Home wiki on front page, and add simple explanations about each wu. 2. Have a 5% graphics option for people that would like to put more use to the wu then the graphics. 3. Could we simplify the wu explanation in the log and on the screens? most times i read the explanations on the screen, and am not quite sure what my wu is a part of. Thats what drew me to Find-a-drug. The simplicity and me know what i was working on and how it helped out. 4. Is it possible if we could add an area on the graphics that shows or calculates our lowest RMSD and Energy for that wu? Heres just a few things id like to see put into place, if we could :) Anyone have any other ideas they would like to add? |
soriak Send message Joined: 25 Oct 05 Posts: 102 Credit: 137,632 RAC: 0 |
There's a "how to bring more people to Rosetta" suggestion thread, but maybe a sticky where only mods can post would be a good idea. Put it all in one place without the lengthy discussion. As for suggestions regarding the application: How much processing power does calculating the RMSD for every possible 'structure' use? Wouldn't it be more efficient to just calculate the RMSD for the lowest energy structure of each model? (the only one that actually gets submitted) I doubt RMSD is part of the algorithm, as it'd be useless on unknown structures. |
Christoph Send message Joined: 10 Dec 05 Posts: 57 Credit: 1,512,386 RAC: 0 |
How much processing power does calculating the RMSD for every possible 'structure' use? Wouldn't it be more efficient to just calculate the RMSD for the lowest energy structure of each model? (the only one that actually gets submitted) Yes, it would be more efficient, but then you wouldn't have a RMSD graph in the screensaver and you couldn't compare the energy with the RMSD. Here are my suggestions: - It would be great if you could extend the current WU explanation in the screensaver to show the RCSB protein id, the sequence length and the full protein name (the one shown on each RCSB protein page) - As Admin already suggested, perhaps you could show the the lowest RMSD and Energy in the screensaver (perhaps in brackets after the current value) - As far as I know, the best structure of each model (shown in the plot as red dots) isn't shown in the screensaver, when the app restarts from a checkpoint. Perhaps you could save this information at a checkpoint. |
Admin Send message Joined: 13 Apr 07 Posts: 42 Credit: 260,782 RAC: 0 |
Looking to start this thread up again. Anyone with ideas that have a possibility of being implemented? |
Michael G.R. Send message Joined: 11 Nov 05 Posts: 264 Credit: 11,247,510 RAC: 0 |
Looking to start this thread up again. Anyone with ideas that have a possibility of being implemented? I've said it often and don't want to seem redundant, but I think SSE/SSE2/etc optimizations would be an obvious one. |
Tiago Send message Joined: 11 Jul 06 Posts: 55 Credit: 2,538,721 RAC: 0 |
Indeed... optimization in the sofware would give a considerable boost to the project. |
Tribaal Send message Joined: 6 Feb 06 Posts: 80 Credit: 2,754,607 RAC: 0 |
- I agree to the optimistation... - Maybe porting to other architectures? Like the cell processor? ;) GPUs? On this chapter, maybe the baker lab should consider something like what the linux drivers project has done [1], letting some open source developpers access the source code to port to new architectures under strong NDAs? - I'd like to see a linux version of the graphics (I don't mean to flamebait, I would really use it). Trib' [1]: http://www.linuxdriverproject.org/twiki/bin/view |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,598,006 RAC: 10,719 |
- I agree to the optimistation... It is already available - Who? and Mats Peterson were looking at this a while back, but I don't (i believe Who? was looking at porting it to use SSE4) but I don't think anything came of this. There's a lot of code - 1m+ lines apparently. It's not a simple little program like SETI ;D |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
...to clarify, what is already available is... "access the source code... under strong NDAs" (Non-disclosure Agreements) Rosetta Moderator: Mod.Sense |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
- I agree to the optimistation... yes, the size of the code is a problem for getting up and running. we have a new much cleaner and for now smaller version which we will be sending out on rosetta@home soon; might be a better starting point for code optimization experts interested in helping out rosetta@home! |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
Would someone from the project offer the professional courtesy of providing a definitive answer, with respect to Rosetta's memory footprint and adaptability for parallel processing on gaming consoles, ala this thread? Thanking you in advance. - I agree to the optimistation... |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Would someone from the project offer the professional courtesy of providing a definitive answer, with respect to Rosetta's memory footprint and adaptability for parallel processing on gaming consoles, ala this thread? My 2 cents on this topic, which seems to crop up in one form or another quite frequently. I don't know the specifics of why Rosetta hasn't done this yet but I have worked on optimizing large projects in the past and can suggest some of the issues they might be faced with. If it was easy to use SIMD instructions like SSE3, etc. in Rosetta I imagine it would have already been done, but the fact is many algorithms just don't lend themselves to easy data-level parallelization. Some do, like those in Photoshop filters or digital signal processing, but if the next step in the process is always dependent on the results of the previous step SIMD doesn't help, and from what little I know of this type of molecular simulation software this will be true for Rosetta. I'm sure people have looked hard at the innermost loops of Rosetta and concluded that either it couldn't be vectorized or that the effort needed to do so would be better spent elsewhere. Even if the above is incorrect there are other issues to consider. Maintaining a code base containing SIMD code (writing it in the first place is quite a specialized skill) has its painful aspects. It's necessary to write multiple versions of the same routine, one for each kind of instruction set that's out there. If anything in the code needs changing it's necessary to change all the routines. For this reason you'd probably only want to implement code using SIMD when the code is mature and almost certain not to change. Not a show stopper but something that needs to be taken into account. These problems are compounded when you try to convert software written for a general purpose CPU to run on a GPU or something like the Cell processor in the PS3. The specs of these processors may look impressive but they are somewhat restricted in what they can do relative to a CPU and programming them requires an entirely different way of thinking about the problem: they have the reputation of being very difficult to program effectively, and it would probably involve a major rewrite of Rosetta to get it working. F@H seems to have overcome these difficulties but even there, the programs that can be run on the PS3 or on ATI graphics cards are only a subset of those that can be run on a general purpose CPU. Ultimately I imagine this comes down to a question of what is the most effective use of Rosetta's programming resources. Is it better to fix bugs and add refinements that will improve the accuracy of the predicted structures or to invest the resources needed to make it run on faster hardware? Right now probably the former: Rosetta is after all a research project. Perhaps in the future when the project is complete ( if it ever is ) and passed off to the World Community Grid this will change. |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
Thank you for a well thought out response. For example, how was this possible? "When David Baker, who also serves as a principal investigator for Howard Hughes Medical Institute, originally developed the code, it had to be run in serial - broken into manageable amounts of data, with each portion calculated in series, one after another. Through a research collaboration, SDSC's expertise and supercomputing resources helped modify the Rosetta code to run in parallel on SDSC's massive supercomputers, dramatically speeding processing, and providing a testing ground for running the code on the world's fastest non-classified computer. The groundbreaking demonstration, part of the biennial Critical Assessment of Structure Prediction (CASP) competition, used UW professor David Baker's Rosetta Code and ran on more than 40.000 central processing units (CPUs) of IBM's Blue Gene Watson Supercomputer, using the experience gained on the Blue Gene Data system installed at SDSC." but the fact is many algorithms just don't lend themselves to easy data-level parallelization. Some do, like those in Photoshop filters or digital signal processing, but if the next step in the process is always dependent on the results of the previous step SIMD doesn't help, and from what little I know of this type of molecular simulation software this will be true for Rosetta. I'm sure people have looked hard at the innermost loops of Rosetta and concluded that either it couldn't be vectorized or that the effort needed to do so would be better spent elsewhere. Agreed. These problems are compounded when you try to convert software written for a general purpose CPU to run on a GPU or something like the Cell processor in the PS3. The specs of these processors may look impressive but they are somewhat restricted in what they can do relative to a CPU and programming them requires an entirely different way of thinking about the problem: they have the reputation of being very difficult to program effectively, and it would probably involve a major rewrite of Rosetta to get it working. F@H seems to have overcome these difficulties but even there, the programs that can be run on the PS3 or on ATI graphics cards are only a subset of those that can be run on a general purpose CPU. Also agreed. But, if this is true, why did they bother to make the effort to convert Rosetta to parallelized code to run on supercomputers / IBM Blue Gene? It would seem that the PS/3 (as opposed to pc's - multiple different cpu's, os's, amounts of ram and hdd) is both standardized and an open platform. F@H is at the petaflop level. I doubt Baker Labs would turn down petaflop level potential. Arguably, it would be resources well spent. I'd really be curious to hear what DB himself has to say, now that the PS/3 is on the 65nm chip, with double precision and reduced watts. Ultimately I imagine this comes down to a question of what is the most effective use of Rosetta's programming resources. Is it better to fix bugs and add refinements that will improve the accuracy of the predicted structures or to invest the resources needed to make it run on faster hardware? Right now probably the former: Rosetta is after all a research project. Perhaps in the future when the project is complete ( if it ever is ) and passed off to the World Community Grid this will change. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Just a guess, but perhaps the parallelism there was at the same level as that we're seeing in the BOINC version of Rosetta, with one decoy being assigned to each CPU. Memory constraints, etc, permitting; that might be a relatively easy thing to implement.
True, but in addition to the issues I mentioned, F@H comes preinstalled on PS3's and I imagine users would be more likely to run that than R@H, even if the latter were available.
Other things being equal, I doubt he would either!
Perhaps as stream computing becomes more mainstream porting to these platforms will become a more attractive option. Still I don't see it happening soon.
|
Viking69 Send message Joined: 3 Oct 05 Posts: 20 Credit: 6,804,326 RAC: 2,971 |
Rosetta is after all a research project. Perhaps in the future when the project is complete ( if it ever is ) and passed off to the World Community Grid this will change. I can't see a time when this project would ever be complete, but possibly replaced with a more powerful project as the technology advances. I don't understand your sayning that if this project did 'complete' that it would be passed on to WCG. WHY? All that does is process work in a similar vein to all the other BOINC projects, although it started up as another system to join peoples PC's together to process data. Why would Rosetta pass its work off to that system? Hi all you enthusiastic crunchers..... |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
Unable to do same with the 6 SPE's in the PS/3's Cell BE ? Memory restrictions? Just a guess, but perhaps the parallelism there was at the same level as that we're seeing in the BOINC version of Rosetta, with one decoy being assigned to each CPU. Memory constraints, etc, permitting; that might be a relatively easy thing to implement. I think Sony is desperate for sales, they lost what, $500 million on the PS/3 hardware so far? I'd imagine they'd have to help with the porting, and putting a R@H icon on the PS/3 is likely a small price to boost sales to crazy people like me. As time goes on, I do intend to purchase multiple PS/3's. $399 is a bargin! And it'll all go to F@H until something else comes along that will use all 6 of the SPE's. True, but in addition to the issues I mentioned, F@H comes preinstalled on PS3's and I imagine users would be more likely to run that than R@H, even if the latter were available. No pain, no gain. F@H chanced it, and they're at petaflop, and I have no doubt they'll hit 2 pflops within a year. Can this potential really be ignored? If PS/3 requires a larger memory footprint than the PS/3 offers, you can't get blood from a (Rosetta) stone. Just someone from the project come out and make the definitive statement that this is the case. And if they can't make such a definitive statement, lets at least have a discussion on other potential concerns. Perhaps as stream computing becomes more mainstream porting to these platforms will become a more attractive option. Still I don't see it happening soon. I'd still like the good Doc himself to weigh in. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I can't see a time when this project would ever be complete, but possibly replaced with a more powerful project as the technology advances. I agree that my original sentence could have been phrased better, but the WCG already makes use of Rosetta, although I don't know what version they use. See http://www.worldcommunitygrid.org/projects_showcase/viewHpf2Research.do |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I think Sony is desperate for sales, they lost what, $500 million on the PS/3 hardware so far? I'd imagine they'd have to help with the porting, and putting a R@H icon on the PS/3 is likely a small price to boost sales to crazy people like me. As time goes on, I do intend to purchase multiple PS/3's. $399 is a bargin! And it'll all go to F@H until something else comes along that will use all 6 of the SPE's. My understanding was that the PS3 is actually sold right now at a loss as Sony are hoping to push Blu-Ray and sell games along with the console. The following paper discusses scientific programming on the PS3. I skipped the gruesome technical bits in the middle and read just the introduction and summary, where the weaknesses of the processor are discussed. http://www.netlib.org/utk/people/JackDongarra/PAPERS/scop3.pdf |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
Thanx, think I had previously read (and dl'ed) a copy of this, but I am currently re-reading. Yes, IIRC Sony was taking about a $250 loss on each PS/3. This may be less now, with the increased yields of the new 65nm CBEAs. Thats why I say the new 40gb PS/3 with the 65nm CBEA is a bargin at $399, and I would not hesitate to purchase multiple units over time, as finances permit. Sorry that Sony will lose money on me, as I am not a gamer, and regular DVDs are fine for me. I would be purchasing it strictly as a (super)computer. Right now, today, not some unknown time in the future, F@H is able to use it, and thats good enough for me. Hope one day that Rosie will as well. My understanding was that the PS3 is actually sold right now at a loss as Sony are hoping to push Blu-Ray and sell games along with the console. |
Admin Send message Joined: 13 Apr 07 Posts: 42 Credit: 260,782 RAC: 0 |
As so we dont get too off topic i have compiled a list of what has been said so far so it will make it easier for additions What has been suggested: 1. Put Rosetta@Home wiki on front page, and add simple explanations about each wu. 2. Have a 5% graphics option for people that would like to put more use to the wu then the graphics. 3. Could we simplify the wu explanation in the log and on the screens? most times i read the explanations on the screen, and am not quite sure what my wu is a part of. Thats what drew me to Find-a-drug. The simplicity and me know what i was working on and how it helped out. 4. Is it possible if we could add an area on the graphics that shows or calculates our lowest RMSD and Energy for that wu? 5. It would be great if you could extend the current WU explanation in the screensaver to show the RCSB protein id, the sequence length and the full protein name (the one shown on each RCSB protein page) 6. As far as I know, the best structure of each model (shown in the plot as red dots) isn't shown in the screensaver, when the app restarts from a checkpoint. Perhaps you could save this information at a checkpoint. 7. Maybe porting to other architectures? Like the cell processor? ;) GPUs? 8. maybe the baker lab should consider something like what the linux drivers project has done [1], letting some open source developers access the source code to port to new architectures under strong NDAs? 9. SSE optimizations 10. Moving to the possibility of using gaming consoles What we know: 1. We are able to access the source code... under strong NDAs (Non-disclosure Agreements) 2. A cleaner version is coming out soon to help coders to optimize the program. Suggestions/ News on suggesstions? |
Message boards :
Rosetta@home Science :
Suggestions for improvement thread
©2024 University of Washington
https://www.bakerlab.org