Message boards : Number crunching : Not polling for 22+ hours?
Author | Message |
---|---|
Sid Send message Joined: 12 Jun 07 Posts: 9 Credit: 3,576,593 RAC: 0 |
. . . caused one of my i7's to run dry for about 6 hours. Anyone else have this issue? |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
This happens from time to time when the servers get really loaded (usually due to huge spikes of new hosts coming online all at once). This has been the case from time to time lately as CE brings more and more hosts over to R@H. It's sort of a good thing, though a bit of an annoyance. Hopefully a server upgrade will eventually fix this. |
Sid Send message Joined: 12 Jun 07 Posts: 9 Credit: 3,576,593 RAC: 0 |
This happens from time to time when the servers get really loaded (usually due to huge spikes of new hosts coming online all at once). Thanks for the response, I'm glad that the Pentathlon hasn't smoked Rosetta's servers. . . . |
Repaxan Send message Joined: 28 Jun 08 Posts: 1 Credit: 532,897 RAC: 0 |
Scheduler updates been failing for the past ~48 hours for me. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
Hopefully a server upgrade will eventually fix this. While the grass grows..... |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
Increase the minimum reserve. That's what I did. Same, I now run with a minimum of 0.7 days and a max of 0.8 days and this seems to be enough to keep the beasts fed while still keeping my average turn around time nice and low so as to support CASP efforts that require quick turn around to meet the deadlines and still give the scientists time to do their analysis on the results. **38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,156,645 RAC: 15,906 |
Increase the minimum reserve. That's what I did. Great if it works for you. With a potential unexpected 24hr delay I'm working on the basis that a buffer of 1 day plus run time will cover all eventualities while keeping within deadlines. I use 1.5 days, down from my previous 2.0 days, while using an 8hr runtime, to account for variations. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,156,645 RAC: 15,906 |
Ok, think I've worked out what's happening with something I've seen a few times now. This one's typical: rb_06_17_66429_110474__t000__ab_robetta_IGNORE_THE_REST_381779_795_1 Task returned well within deadline, but validation reports "Task was reported too late to validate" Checking the workunit details I see it's been issued before, but missed its deadline so got re-issued out to me. A few hours later, the original task gets returned back after deadline but before I get mine back. It's credited to them, quite rightly, then by the time I get mine back the task has already been shutdown. If that user suffered one of those 24hr delays after a failure to pick up tasks on polling, they miss their deadline, the task gets reissued, then even if that reissue is back within its own deadline it will fail in the way described above. Not yet sure if the overnight jobs find these instances and retrospectively issues credit or if it's lost altogether. In any case, it doesn't seem to do anyone any good - user or project. Is there some way to escalate a resolution as there seem to be problematic knock-on effects all round. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Not certain it's the same problem, but Rosetta@home is definitely broken again, and not just the Mac client this week. According to the service status page, everything is working fine, just fine and dandy. The same spokesman will probably pop up again and say they really do care, but their actions and inactions speak much more loudly than any words. They don't know what's going on, they don't know when it's working and when it isn't, they can't even provide accurate status information, but perhaps most importantly, there's very little evidence (that I can detect) that the problems are being tackled in anything resembling a systematic way. I've already noted that such carelessness is quite likely to shadow any results they publish. Even that didn't seem to motivate them towards improvements. Good thing they've cured me from caring too much. Bad thing that I can't stop myself from wasting the keystrokes with another suggestion. Maybe the Rosetta@home people need to figure out how many clients they can actually handle and route extra clients to other projects. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Message boards :
Number crunching :
Not polling for 22+ hours?
©2024 University of Washington
https://www.bakerlab.org