Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 32 · 33 · 34 · 35 · 36 · 37 · 38 . . . 55 · Next
Author | Message |
---|---|
loftwyr Send message Joined: 23 Dec 07 Posts: 1 Credit: 775,197 RAC: 0 |
So it turns out the servers are working... but extremely slow in response (for some reason, we are still trying to figure out why). Didn't work for me. I'm still failing. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
suggested cc_config didn't work for me either. Forced BOINC Manager to reread config files, retry the upload. Interestingly, it seems to fail in 8 seconds, which is less than the timeout specified. I'm on BOINC 7.2.42 on Windows. Rosetta Moderator: Mod.Sense |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Added the config file. No apparent effect here. |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
=[ Thanks for checking! |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Finally started downloading some fresh units to work on, but the servers definitely seem to be thrashing most horribly. LOTS of retries and no pattern except for the obvious feature that smaller files are seeming to have more successes. Uploads still seem to be blocked almost completely. It seems possible that one of my completed tasks may have been reported, but most of them are not even trying anymore. How about a DoS attack? The quasi-pattern reminds me of socket failures, but they can be induced. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Uploads for me are sill not working for about 70 hours now. This maybe a problem form the university where the servers are maintained, but the server code is still from the 90's. It is old-dated for year already that is also debt to the low credits. Greetings, TJ. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
I don't believe I am the only person who noticed all those 80-Meg "Computation Error" tasks that the system was broadcasting for many months before (and at least a couple of months after) I commented on them in these discussions. I think you are exaggerating there just a little. Your post about that issue was 30 days ago so a "couple of months after" is a little bit wide of the mark. If I recall correctly the problem started a couple of weeks before your post and finished a week or two afterwards. It would have been best if the scientist in charge of that experiment had responded earlier but it was definitely not as long lasting as you suggest. How about a DoS attack? The quasi-pattern reminds me of socket failures, but they can be induced. As there are 501k tasks currently in progress (and most probably finished by now) the recovery phase after this outage will probably share some of the characteristics of a DoS attack as every system tries to upload at once. |
jareeq Send message Joined: 28 Apr 12 Posts: 2 Credit: 4,149,828 RAC: 0 |
Uploads for me are sill not working for about 70 hours now. ok - now I see why all of my results are frozen in upload. Do you have any information about when it will be fixed ? |
JohnH Send message Joined: 25 Mar 13 Posts: 43 Credit: 2,319,355 RAC: 0 |
I just finished one of my 9 queued uploads. At about the same time I got a bunch of downloads started. They're all stuck at 0% complete. I ran wireshark on my Win 8.1 on net 128.95.160.0/24 which includes all the Rosetta servers I think. It shows loadsa TCP exceptions like 3 way handshake timeouts, duplicate acks TCP retransmits etc. Now I'm a long way away from UW both geographically and networkly but it smells to me like some layer 3 switch or router specific to that subnet is in trouble. Dunno if that helps or not but hey... |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Uploads for me are sill not working for about 70 hours now. Whenever KEL (IT for Rosetta) can get the UW computing guys to help him figure out what went wrong. It is typical in these situations for the project to be out of service anywhere from a few days to a week. Just depends. Find another project to keep your system busy in the meantime. |
Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,103,850 RAC: 611 |
One of my results partially uploaded before getting stuck. So -as mentioned- the server is probably throttled but not dead. I have 54 entries waiting to be uploaded... |
The_Saint_(LDS) Send message Joined: 12 Aug 10 Posts: 6 Credit: 10,076,132 RAC: 0 |
One of my results partially uploaded before getting stuck. So -as mentioned- the server is probably throttled but not dead. I'm glad I had a queue on most of my little farm...only one is out of work right now but all of them have almost 3 days worth to upload (some with partial uploads)...somewhere over 300 results to upload now. Oh well...here's hoping they get to the bottom of this soon. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,123 |
Did you read the news message on the home page, which says than the problem is now known to be that the connection between the university's network and the rest of the internet is running much slower than usual? Nothing on why, or when it will be fixed, yet though. That should mean that anyone on their campus should get faster than usual response from their server, due to less competition from the rest of the world. Many of you currently participating only in Rosetta@Home might want to add World Community Grid to your list of BOINC projects, but with a 0% resource share so that workunits will only be downloaded from WCG when none are available from Rosetta@Home. http://www.worldcommunitygrid.org/ |
premier Send message Joined: 30 Dec 05 Posts: 14 Credit: 23,872,868 RAC: 0 |
Guys, You suck. I have never saw network failure that can't be repaired within 12 hours or less (I manage large networks). It's 4'th day without ability to upload/download anything. Come on guys, I am supporting you since 2005, and I always thought about R@H as best of the best projects. But from some time I am considering leaving it because: 1. You do not wan't to share source code - how the hell could I be sure I am not part of Bitcoin botnet or other strange project? 2. There is large number of errors in WU's 3. Current project status for me is DOWN. Guys do something or you loose lot of compute power. |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Hi premier, Thank you for your support these last 5 years! I'm sorry about the current events... 1) the source code is free and available online for download, all you have to do is agree that you would not use it for profit (it's really easy to get): https://c4c.uwc4c.com/express_license_technologies/rosetta 2) Working on it... 3) Most of the Rosetta Community is out of town for a conference... Won't be back at the university till this weekend. I was not able to repair it myself, and have to wait till the experts are back (or at least till they have access to the internet). =[ Guys, You suck. I have never saw network failure that can't be repaired within 12 hours or less (I manage large networks). It's 4'th day without ability to upload/download anything. Come on guys, I am supporting you since 2005, and I always thought about R@H as best of the best projects. But from some time I am considering leaving it because: |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 6,628 |
Strange. Admins say it's a university network problem. But Ralph@home runs ok (upload/download) and, i think, it is on the same network.... |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Yeah, some servers were unaffected... for example try curl -v --connect-timeout 0 boinc.bakerlab.org vs curl -v --connect-timeout 0 srv2.bakerlab.org (both should show the same content). The latter takes 2 mins to load. The Boinc manager uses the curl library, and by default timeout(s) if the connection is idle for certain number of seconds (hence why it fails...) Each job is assigned one of the 5 servers srv[1..5].bakerlab.org. (each has its own ip address) Our working hypothesis is that UW imposed some kind of Bandwidth throttling on the high traffic ip addresses... Ralph@home has its own server... which has not yet been "flagged" as high traffic. One temporary fix I was consider would be to modify the hosts file to redirect srv[1..5] to boinc.bakerlab.org ip address... but this could cause boinc.bakerlab.org to get flagged and killed. =[ Strange. Admins say it's a university network problem. But Ralph@home runs ok (upload/download) and, i think, it is on the same network.... |
Gallstone Send message Joined: 31 May 12 Posts: 3 Credit: 443,740 RAC: 0 |
Upload not possible for me too. Therefore question: How about deadlines? I have 4 completed tasks ready for upload, but deadline is Aug 2, 16:09 UTC. If the problem isn't solved by then, will deadlines be extended? Or will those tasks receive scores even if uploaded beyond deadline? |
biodoc Send message Joined: 19 Feb 06 Posts: 14 Credit: 30,717,792 RAC: 0 |
Perhaps the Baker lab use of University Network bandwith is affecting student access to facebook, youtube and Netflix? Sad, science no longer rules these days. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,161,072 RAC: 15,284 |
3) Most of the Rosetta Community is out of town for a conference... Won't be back at the university till this weekend. I was not able to repair it myself, and have to wait till the experts are back (or at least till they have access to the internet). =[ Obviously this isn't the news we wanted, but it's important you've said it because we can adjust our expectations (and processing) accordingly. It's disappointing you've been put in this position and the IT staff haven't supported you by calling over expert help from elsewhere in the faculty. Thanks for trying. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org