Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 55 · Next
Author | Message |
---|---|
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Mikey -- yup -- I am considering something similar. Perhaps focusing on Malaria and SETI a bit more for me -- I'm sort of unhappy with POEM -- they have had a few outages and their inability to feed to the GPU side, given the low numbers they generate from the the CPU side sort of pushes me away a bit. I figure I should 'reward' SETI a bit for the changes they have made to improve their performance. Rosetta, which runs solid as a project has (as you noted) something of a disinclination to acknowledge issues and even more a disinclination to act to resolve those relatively rare issues they encounter. In this case, I would think it relatively simple to 1) Acknowledge the problems with the cryo units and 2) Stop generating them. So I don't believe it is a case of a technical problem here, but rather one of those keyboard/cerebellum issues... Indeed -- the problem of course is that while I periodically ferret out ALL Cryo units I have, not only does that not stop me from getting new ones, but also the ones I abort simply go back into the queue for future downloads. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,123 |
BarryAZ, are you aware of the bandwidth problems SETI is currently having? For example, it takes several days to download the input files for one of their Astropulse workunits, but only about an hour to run it once all the input files are downloaded. I've given them a few suggestions on how to make more efficient use of their bandwidth; they don't seem to be using them. As for malaria, there are two sources of workunits: malariacontrol.net http://www.malariacontrol.net/ World Community Grid http://www.worldcommunitygrid.org/ GO Fight Against Malaria project only |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Robert -- for SETI and Astropulse -- I understand -- I don't do GPU work units for SETI - a constraint that is easy enough to configure. As to CPU work units for SETI, since the relocation downloads (and uploads) have been much improved for me. Regarding Malaria -- actually I run both the Malaria project (and have for years) and the World community grid project (and have for years). For World Community Grid, I've even shifted three GPU's to support it (AMD 6x and 7x GPU's). One thing I've not seen much of with their projects though (albeit some with Malaria on a single workstation which leads me to believe its my problem), is work units that run for more than a few minutes and yield computational errors. I have seen that problem occasionally with Einstein though. BarryAZ, are you aware of the bandwidth problems SETI is currently having? For example, it takes several days to download the input files for one of their Astropulse workunits, but only about an hour to run it once all the input files |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
There are no doubt plenty of folks running Rosetta in a 'no attention mode' - and they are really wasting CPU cycles here. That's correct BarryAZ. I aborted all the cryo's, but I have to work, sleep and study from time to time, so a lot slips through my hands... The only one responding is Mod.Sense but he has actually no influence on the project team. I wonder if they update him on all things going. The problem is that the cause of this project is GOOD, it can really help in the future. I have seen brain diseases and cancer from close by so I will contribute and stick to the project. But the team could learn a lot from other projects, like Einstein@home (they are the best). And one more off topic, Fightmalaria@home is another project for Malaria. I use that as a back-up project. And is a good cause as well as it is a nasty disease. Greetings, TJ. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. It doesn't look like the validator is running as all tasks that i have returned this morning are still pending after hours of waiting. ( the server status is showing green ) They usually go valid after a few minutes. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Hi. I just sent 6 tasks home manually, 5 have credit almost immediately, one is still pending. Greetings, TJ. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Well my returned tasks from earlier are still stuck! I'll have a another look later on today. |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I believe there is a problem the teraflops has dropped from 120+ to TeraFLOPS estimate: 89.449, to what it is now & falling. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Indeed -- looks like the validator is in trouble. Maybe the validator has been dipped in the Cryo tank <rueful smile> I believe there is a problem the teraflops has dropped from 120+ to |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,137,231 RAC: 4,613 |
Indeed -- looks like the validator is in trouble. Maybe the validator has been dipped in the Cryo tank <rueful smile> That would make sense, all those units that both get aborted and those that error out all cause the validator to handle each and every one. Maybe NOW they will finally do something about them!! I set all my pc's to NNT yesterday and 99% are now out of work, I have re-enabled two and no cryo tasks came thru. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
I haven't seen any project notice of issues -- then again, the form here is that the volunteers handling the message boards collect feedback, pass it to the project folks and after careful rumination, the folks actually in the project staff consider the possibility that something needs to be done. Then, often, they do act. After they have done whatever it is they have determined needs to be done, then they pass that information on to the volunteers here. So, assuming a normal process, we should see manifestations of the project folks doing something (unannounced) about the reported problems (and may be seeing them now) over the coming day or two or three and then by the weekend, we will get a message or two regarding the project understanding of the problem and what they have done. Rosetta is typically a more reliable BOINC project than most, but perhaps no more and maybe less communicative (from the project side) than average. Indeed -- looks like the validator is in trouble. Maybe the validator has been dipped in the Cryo tank <rueful smile> |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
The waiting begins.... there are more problems. Not uploading, a lot more pending, and new errors while computing (and they did run out of cryo ;-) (I borrowed that from BarryAZ)), its a CASPx_ task this time. Seems though that I get new work when requesting manually. Greetings, TJ. |
Cutchet Salvador Send message Joined: 1 Feb 10 Posts: 17 Credit: 10,690,439 RAC: 0 |
Not news,good news?? Servers all green color! Congratulations to the department of communication and public relations. In the XXIst century they keep on trusting in the drums and the pigeons to communicate,thank you. Greetings, Salvador |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Thanks for the follow up message indicating closer monitoring of the boards. To facilitate matters, here is a summary of the current issues I've seen and seen reported by others. 1) Cryo work units -- spinning out computation errors, not all the time but quite often and a fair proportion of the time these errors are after hours of processing. 2) Validation issues -- normally Rosetta has a very low proportion of pendings. Currently that number (since Monday morning it seems) has been rising. 3) Some of the CASP work units -- spinning out computation errors -- these seem to happen early on (after a few minutes). 4) Uploading issues -- this started late last night. All in all, things are rather unwell in Rosetta-land at the moment. For me, my approach is to temporarily suspend processing of Rosetta pending resolution and ideally reports of resolution. BarryAZ |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
I wonder if the Rosetta staff even knows there's a problem. Also, I have these HUGE WU, that after about 6 hours or so, they still are on Model 1, Step 0, and on the graphics, it shows just one big "sinusoidal" line in the "Searching..." graph and the rest of the graphs are blank. EDIT: After6-7 hours of running, they DO NOT checkpoint. I'm aborting all tasks and running SETI in the meanwhile. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. There seems to be a problem with downloads now as well as the other problems, they are slow & are timing out like before. Wed 24 Apr 2013 11:53:10 EST Project communication failed: attempting access to reference site Wed 24 Apr 2013 11:53:10 EST rosetta@home Temporarily failed download of rb_04_23_37754_72868_h003__sirtd_h003_.psipred_ss2: connect() failed Wed 24 Apr 2013 11:53:10 EST rosetta@home Temporarily failed download of rb_04_23_37754_72868_h003__sirtd_h003_.fasta: connect() failed Wed 24 Apr 2013 11:53:10 EST rosetta@home Started download of rb_04_23_37754_72868_h003__sirtd_h003_.nobuformat.psipred_ss2 Wed 24 Apr 2013 11:53:10 EST rosetta@home Started download of rb_04_23_37754_72868_h003__sirtd_aah003_03_05.200_v1_3.gz Wed 24 Apr 2013 11:53:12 EST Internet access OK - project servers may be temporarily down. Wed 24 Apr 2013 11:53:33 EST Project communication failed: attempting access to reference site Wed 24 Apr 2013 11:53:33 EST rosetta@home Temporarily failed download of rb_04_23_37754_72868_h003__sirtd_h003_.nobuformat.psipred_ss2: connect() failed Wed 24 Apr 2013 11:53:33 EST rosetta@home Temporarily failed download of rb_04_23_37754_72868_h003__sirtd_aah003_03_05.200_v1_3.gz: connect() failed Wed 24 Apr 2013 11:53:33 EST rosetta@home Started download of rb_04_23_37754_72868_h003__sirtd_aah003_17_05.200_v1_3.gz Wed 24 Apr 2013 11:53:34 EST Internet access OK - project servers may be temporarily down. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,123 |
Thanks for the follow up message indicating closer monitoring of the boards. In another forum, I've seen a statement that the cryo work units run properly on Macs, but not on whatever other type of computer that poster used. For the last few cryo workunits on my Windows 7 computer, they failed for me and for all wingmates using Windows 7. One succeeded for a wingmate using Windows 8, though. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
But since, as has been noted before, this project doesn't support application specific choices at either the project level or workstation level, that places the onus on the project to not generate work units that are OS specific. At this point though, the project has a batch of problems so it makes sense for all of us to 'help out' by suspending processing until the project addresses the problems and confirms that the various fixes are in place and tested out. Thanks for the follow up message indicating closer monitoring of the boards. |
morgan Send message Joined: 30 Jun 06 Posts: 3 Credit: 387,964 RAC: 0 |
Thanks for the follow up message indicating closer monitoring of the boards. BarryAZ You took this from my mouth, yes! hihi in other words; Have the SAME PROBLEMS HERE |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org