Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 55 · Next
Author | Message |
---|---|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,137,017 RAC: 4,594 |
One thing to remember is almost ALL webpages are cached and only refreshed on a periodic basis, some by the minute, some by the hour and some by the day or week. There is no way to tell unless you check and notice a change, or they come out and actually tell us. Those WORDS were in the way, THANKS now I can SEE!!! |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This download problem is still happening, this one file is just over 4MB & keeps getting stuck & retries after a couple of KB over & over. Some larger files download o.k. others don't. I'm not having any problems with my other projects JUST ROSETTA. Sun 10 Mar 2013 09:20:09 EST Project communication failed: attempting access to reference site Sun 10 Mar 2013 09:20:09 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error Sun 10 Mar 2013 09:20:10 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz Sun 10 Mar 2013 09:20:11 EST Internet access OK - project servers may be temporarily down. Sun 10 Mar 2013 09:25:19 EST Project communication failed: attempting access to reference site Sun 10 Mar 2013 09:25:19 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error Sun 10 Mar 2013 09:25:20 EST Internet access OK - project servers may be temporarily down. Sun 10 Mar 2013 09:25:20 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz Sun 10 Mar 2013 09:30:32 EST Project communication failed: attempting access to reference site Sun 10 Mar 2013 09:30:32 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error Sun 10 Mar 2013 09:30:34 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz Sun 10 Mar 2013 09:30:35 EST Internet access OK - project servers may be temporarily down. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
This download problem is still happening, this one file is just over 4MB & keeps getting stuck & retries after a couple of KB over & over. I posted a possible solution in the Slow to download thread, it has helped a lot people with SETI downloads, it also might help here. . |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This download problem is still happening, this one file is just over 4MB & keeps getting stuck & retries after a couple of KB over & over. I'm using Ubuntu on both rigs, not windows! |
PanicMan Send message Joined: 31 Jan 10 Posts: 7 Credit: 276,651 RAC: 0 |
i have been having issues with rosetta for about a week or so now...i reset project last week sometime due to errors..now i noticed this when i got home today..there was about 200 of them i copy/pasted a small section...along with 2 that had computational errors 3/29/2013 8:25:05 AM | rosetta@home | Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file 3/29/2013 8:25:05 AM | rosetta@home | If this happens repeatedly you may need to reset the project. 3/29/2013 8:25:05 AM | rosetta@home | Restarting task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 using minirosetta version 345 in slot 1 3/29/2013 8:25:07 AM | rosetta@home | Task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 exited with zero status but no 'finished' file 3/29/2013 8:25:07 AM | rosetta@home | If this happens repeatedly you may need to reset the project. 3/29/2013 8:25:07 AM | rosetta@home | Restarting task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 using minirosetta version 345 in slot 2 3/29/2013 8:25:47 AM | rosetta@home | Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file 3/29/2013 8:25:47 AM | rosetta@home | If this happens repeatedly you may need to reset the project. 3/29/2013 8:25:49 AM | rosetta@home | Task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 exited with zero status but no 'finished' file 3/29/2013 8:25:49 AM | rosetta@home | If this happens repeatedly you may need to reset the project. 3/29/2013 8:25:49 AM | rosetta@home | Restarting task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 using minirosetta version 345 in slot 2 3/29/2013 8:26:28 AM | rosetta@home | Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file 3/29/2013 8:26:28 AM | rosetta@home | If this happens repeatedly you may need to reset the project. 3/29/2013 8:26:31 AM | rosetta@home | Task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 exited with zero status but no 'finished' file 3/29/2013 8:26:31 AM | rosetta@home | If this happens repeatedly you may need to reset the project. 3/29/2013 8:27:10 AM | rosetta@home | Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file the messages last week were similar but tasks were obviously different..is this a known issue or has something somehow changed that is causing this? thanks in advance for any help. |
Rabinovitch Send message Joined: 28 Apr 07 Posts: 28 Credit: 5,439,728 RAC: 0 |
Hi all! Consider please my problem described in adjacent thread. From Siberia with love! |
lugal Send message Joined: 16 Jul 08 Posts: 2 Credit: 175,028 RAC: 0 |
Hi all, I happen to have this problem now for several weeks and have reset several times. This is most annoying and I hope someone will start soon to look into that. regards to all Lugal |
chillwater Send message Joined: 18 Dec 09 Posts: 1 Credit: 3,483,338 RAC: 0 |
Greetings, I have received 13 client error/compute error in the last three days. Had only 2 or 3 in the previous 5 days. At least one wingman errored the same wu. Have no other errors in other projects (6). Just so you know. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,137,017 RAC: 4,594 |
Greetings, Apparently so have ALOT of other people too!! It could be as simple as a bad batch or a Server side problem. HOPEFULLY Rosetta is working on it and NOT just watching!! |
Drag'n Smoke Send message Joined: 20 Jan 13 Posts: 1 Credit: 3,690 RAC: 0 |
Work units are not being completed. They are resetting themselves with some impossible deadlines. Just what is the problem??? |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Greetings, I have 36 error out of 176 WU's, it started on April 10. However it seems that today the error rate is very high. Don't expect anything, we have to go through this batch. Greetings, TJ. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
One thing seems apparent -- the Cryo work units have a nasty tendency to yield computation errors. I wish I could configure the client to avoid them. Better yet, I wish the project folks would stop making them generally available until the problem is dealt with at the project level. Those units go computation errors with numerous different computers -- where no other project is kicking up the errors -- so it seems reasonably clear that the problem is a project specific and work unit class specific issue. Greetings, |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,137,017 RAC: 4,594 |
One thing seems apparent -- the Cryo work units have a nasty tendency to yield computation errors. I wish I could configure the client to avoid them. Better yet, I wish the project folks would stop making them generally available until the problem is dealt with at the project level. Those units go computation errors with numerous different computers -- where no other project is kicking up the errors -- so it seems reasonably clear that the problem is a project specific and work unit class specific issue. AMEN TO THAT!! What I WISH Rosetta had, like MOST other Boinc Projects DO HAVE, is a way to select which tasks I want and which I don't want!! I abort EVERY cryo unit I see but they are insidious!! I abort 5 and upon the update they send me 3 more, I abort those and then send me even more!! It is frustrating and causing me to rethink my contribution to Rosie right now!!! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
One thing seems apparent -- the Cryo work units have a nasty tendency to yield computation errors. I wish I could configure the client to avoid them. Better yet, I wish the project folks would stop making them generally available until the problem is dealt with at the project level. Those units go computation errors with numerous different computers -- where no other project is kicking up the errors -- so it seems reasonably clear that the problem is a project specific and work unit class specific issue. And you will notice that NONE of the project guys read these threads and I do mean NONE!!! Saw on the download tab that YF Song was attached to the cyro units, surprised he has not shown up here, certainly the results they are getting back should be telling them something is wrong with the tasks, but that does not seem to be the case. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
These must work for some people, the POTD got one done: Apr 20, 2013 I have not been seeing any cryo's, but finally got one, so suspended other tasks to go run it, it failed for me too. Had already failed for wingman too. I'll just offer the comment on the question of why they do not let you select the work you wish to do, they are working a constant mix of various types of work, and so if such a choice existed, there would quickly be 100 choices. To enable such a thing would tax the scheduler to no end. In the end, I believe I'm correct in saying that so long as the tasks complete normally, you don't so much need such a choice. Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I found I had two other cryo tasks, all failed. One had a successful wingman. It was a MAC. Looks like the POTD only has a MAC as well. Rosetta Moderator: Mod.Sense |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
OK folks thanks for the confirmation -- what I find I do is manually go in an 'pre-abort' the cryo's before they start -- but it is a fair amount of intervention and some bad ones slip by. It wouldn't be so bad if they failed quickly, but some go computation error after a couple of hours of wasted processing time. I would very much like it if the project folks got on this and stopped generating them until they figured out what the problem is on the project side of things. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,123 |
I gad a cryo error also, when it had nearly finished the 12 hours I allow for such workunits. I noticed earlier, that after running for 5 hours, it had not written any checkpoints at all. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Indeed -- the problem of course is that while I periodically ferret out ALL Cryo units I have, not only does that not stop me from getting new ones, but also the ones I abort simply go back into the queue for future downloads. I realize that some of the cryo workunits are OK -- but it seems to me that it is incumbent on the project folks to simply stop these from going out at the project level and debug them there. There are no doubt plenty of folks running Rosetta in a 'no attention mode' - and they are really wasting CPU cycles here. I gad a cryo error also, when it had nearly finished the 12 hours I allow for such workunits. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,137,017 RAC: 4,594 |
Indeed -- the problem of course is that while I periodically ferret out ALL Cryo units I have, not only does that not stop me from getting new ones, but also the ones I abort simply go back into the queue for future downloads. And I think THAT is a major part of what could be Rosetta downfall, they simply don't manage their project well and just keep on keeping on despite the problems they are having. I found TEN cryo units this morning on ONE machine, one had errored out and I aborted the rest!! I have already turned a different machine off, as in NO NEW TASKS, and will move on to Poem with it. Two other machines are now on Eon with Malaria as backups! I am getting VERY tired aborting cryo units or being asleep and having them error out!!! I only have Windows machines, so it seems they will NEVER work for me!! |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org