Message boards : Number crunching : DNS Problems and Late Work Units
Author | Message |
---|---|
SuperSluether Send message Joined: 7 Jul 14 Posts: 10 Credit: 1,357,990 RAC: 0 |
I saw that Rosetta has been having DNS problems lately. While I can connect directly via IP in my browser, I don't know how to do this in BOINC, and now I have some work units that will be late. 1 unit was due yesterday, and 5 more are due on March 10th. Should I abort the late work unit? Or, another way of asking, what happens to Rosetta work units when they are late? |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
I saw that Rosetta has been having DNS problems lately. While I can connect directly via IP in my browser, I don't know how to do this in BOINC, and now I have some work units that will be late. You can use your host file, but you need to know each server name and possibly its aliases, as well as the IP addresses. I just used the hosts file method to get here with this line: 128.95.160.140 boinc.bakerlab.org However, that is not sufficient to get the BOINC client working again. Fortunately, I don't care that much about this rather amateurish project, so I'm not planning to spend more time on it. Already spent too much time being frustrated and annoyed, and even wasted the time sending email to the professor, who didn't bother to reply. Maybe he's sick or something. Anyway, this website should at least be modified to include a mention of the status on the front page. It just has some old news about a media article. (Obviously didn't make much of an impact on me.) #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
BlisteringSheep Send message Joined: 15 Sep 06 Posts: 5 Credit: 26,471,188 RAC: 3,891 |
A static list: 128.95.160.140 boinc.bakerlab.org 128.95.160.141 ralph.bakerlab.org 128.95.160.142 srv1.bakerlab.org 128.95.160.143 srv2.bakerlab.org 128.95.160.144 srv3.bakerlab.org 128.95.160.145 srv4.bakerlab.org 128.95.160.146 srv5.bakerlab.org This covers all the names and IPs I needed at least, for both Rosetta & Ralph. |
SuperSluether Send message Joined: 7 Jul 14 Posts: 10 Credit: 1,357,990 RAC: 0 |
Fortunately, I don't care that much about this rather amateurish project, so I'm not planning to spend more time on it. Already spent too much time being frustrated and annoyed We have an impatient one here... It's not like Rosetta wanted this to happen, their registrar (Dotster) stretched a 10-minute process into 2+ days. A static list: Thanks! Just out of curiosity, how did you find these IPs? |
BlisteringSheep Send message Joined: 15 Sep 06 Posts: 5 Credit: 26,471,188 RAC: 3,891 |
Fortunately, I don't care that much about this rather amateurish project, so I'm not planning to spend more time on it. Already spent too much time being frustrated and annoyed I looked at the boinc files to find out the hosts referenced, got whois information from InterNIC, then used their authoritative nameservers (ns5.bakerlab.org) to resolve them. Their nameservers aren't part of the same IP block, and global DNS still knows their addresses. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
A static list: It works!!! |
LarryMajor Send message Joined: 1 Apr 16 Posts: 22 Credit: 31,533,212 RAC: 0 |
A static list: |
LarryMajor Send message Joined: 1 Apr 16 Posts: 22 Credit: 31,533,212 RAC: 0 |
A static list: Thank you! I've been poking at this for a couple days and your list had the one server I missed. Just reported a couple dozen completed WUs just under deadline. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
We can help the "return of Dns" writing to Icaan President on Twitter @Icaan, @Icaan_presindent, @IcannOmbudsman Please, #RescueRosettaathome, #RescueBakerlabdotorg |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
It looks like we are back!! We can help the "return of Dns" writing to Icaan President on Twitter |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,519,159 RAC: 9,154 |
What happened? Some DNS issue, I know, but how come they took so long to fix it? |
SuperSluether Send message Joined: 7 Jul 14 Posts: 10 Credit: 1,357,990 RAC: 0 |
What happened? Some DNS issue, I know, but how come they took so long to fix it? Somebody was late on verifying the registration, and Dotster took much longer than they should have to re-verify it. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
What happened? Some DNS issue, I know, but how come they took so long to fix it? There is now a link on the homepage to details on the DNS issues. Rosetta Moderator: Mod.Sense |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
I saw that Rosetta has been having DNS problems lately. While I can connect directly via IP in my browser, I don't know how to do this in BOINC, and now I have some work units that will be late. There have been various messages about receiving credit for late work units, but my observations suggest it happens sometimes. I'm also pretty sure that downloaded units will not start if they are past their deadline (so those data downloads were obviously completely wasted bandwidth). I mostly blame the DNS problems on the black-hat hackers and Al Gore, sort of. However, I think it's more of a topical issue for the "Cafe Rosetta" than "Number crunching", so I'll comment there. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I added a 3 day grace period to the server configuration ( |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
I added a 3 day grace period to the server configuration (<grace_period_hours>). Hopefully this will help otherwise I'm open to suggestions and feedback. My suggestion would be simply to avoid downloading data to each client unless the normal usage pattern of that specific client computer makes it reasonably likely the data will be processed quickly enough to satisfy your schedule concerns. Sometimes that might require downloading small units of work for computers that are not running so much. The obvious problem is that the BOINC client may not be capable of providing the projects with the information they need to do that sort of intelligent scheduling. Obviously the client software is positioned to track the usage patterns of each client computer it is running on, but I've seen no evidence that it does so. Also, the API would need calls for the projects to query that history-based information, preferably each time the server is contacted. For intelligent scheduling you basically need to know how is this computer used and how much work does it have queued now. Only then can you make a sound decision about what additional work to send and what the deadlines should be for that work. Haven't we been over all of this several times? I feel like you [an administrator or possibly even the director of the project] should be well positioned to see exactly how many of your downloads are not returned with results before their deadlines have elapsed. All I can do is try to purge (abort) old units that I am reasonably sure will not meet their deadlines--but obviously I do have privileged information about how I use my computers and I don't need to track their usage histories to make those predictions. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Darrell Send message Joined: 28 Sep 06 Posts: 25 Credit: 51,934,631 RAC: 0 |
[. snip .] The obvious problem is that the BOINC client may not be capable of providing the projects with the information they need to do that sort of intelligent scheduling. Obviously the client software is positioned to track the usage patterns of each client computer it is running on, but I've seen no evidence that it does so. IIRC, the BOINC Manager makes the decisions on how much work/how many WUs to download and it is based on 1) how much total work is already in process, 2) how much total work is downloaded but not yet in process, 3) how large the WUs are for the project that wants work, 4) the resource share for the project in relation to the other project(s) in the recent past (history), and 5) how large the user requested work queue is. On Rosetta, the user -may- control item 3 by: BOINC Manager -> Rosetta@home -> Your preferences -> [login if needed here] edit preferences for the venue(s) your computer(s) use -> Target CPU run time = {x hour} -> Update preferences and item 5 by: BOINC Manager -> Options -> Computing preferences -> Computing -> Store at least {m} days of work -> Store up to an additional {n} days of work -> OK If any of these parameters are changed more often than a few days apart, the history data won't fit, and too much or too little may be downloaded. If the computer usage varies widely over a few days, the same thing may happen (e.g., run 24/24 hours for 5 days, then off for 3 days). Using a smaller queue and smaller WU size with a consistent daily use pattern on the computer(s) reduces the risk of lost bandwidth. Assigning backup project(s) reduces the risk of idle computers. These are under user control and choice. I feel like you [an administrator or possibly even the director of the project] should be well positioned to see exactly how many of your downloads are not returned with results before their deadlines have elapsed. All I can do is try to purge (abort) old units that I am reasonably sure will not meet their deadlines--but obviously I do have privileged information about how I use my computers and I don't need to track their usage histories to make those predictions. I agree the project could or possibly does track such data, but I am guessing the payback is too small to be worth the effort. After all, how many non-advanced users (those who never touch tuning parameters) are there in relation to those of us who do? The project (and David E K) do address some of the things over which we have no control, and the other things that we can control, we should adjust as best we can. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Yes, the BOINC Manager decides when to request work, how much work to request, and from which projects. It also tries to avoid over-committing a given machine by pulling down more work than can reasonably be expected to be completed. For the Rosetta server to make any further refinement on that existing system would require additional disk IO for every client scheduler request, and additional CPU for every client scheduler request. Other projects have done more complex scheduling systems, but many have discovered, the hard way, that they do not scale well. In a nutshell, the bandwidth is cheaper than the additional database, disk and CPU load. And the BOINC Manager does a fairly good job of minimizing the potential problem of requesting too much work for the machine to process. Rosetta Moderator: Mod.Sense |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Thank you [Darrell] for the informative parts of your post, even if you had to IIRC disclaim them. The short summary appears to be that the BOINC client (AKA BOINC Manager on each client computer) does, in theory, collect most of the appropriate information to do better scheduling. My short response (though I feel like I am basically repeating myself from a slightly different perspective) is that the work-scheduling results are not very good. In addition, I can report that I spent a lot of time tweaking the various settings that are subject to my control, and either the client ignores my suggestions or I have been unable to figure out how to set them "properly". I definitely feel that I wasted too much time and effort, but this was not limited to the Rosetta@home project (though I was already running R@h when I noticed the pattern of discarding downloaded units). There seems to be a deep assumption in there somewhere that most of the clients are supposed to be running continuously for many hours at a time. (Some of mine do, and others don't.) IIRC the Rosetta@home people have raised or at least mentioned their bandwidth concerns on several occasions. Perhaps I am the only participant who has noticed, but I frequently notice wasted bandwidth, notwithstanding my efforts to avoid such waste (while still 'earning' the points). As I have stated a couple of times, everything keeps coming back to deadlines that are difficult or impossible to satisfy. I don't see any solution to the fundamental problem of bandwidth. They have a lot of data to analyze, and I'm sure they have already explored the obvious efficiencies such as arranging for the same data to receive multiple analyses on a single client computer. (Not sure if I've actually seen some evidence of such patterns.) What is clear (at least to me) is that downloading data that never gets processed at all is not useful. That bandwidth could have been conserved. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Yes, the BOINC Manager decides when to request work, how much work to request, and from which projects. It also tries to avoid over-committing a given machine by pulling down more work than can reasonably be expected to be completed. Now that's a deep and insightful reply, though surprising. If I understand you [Mod.Sense] correctly, and if I am not oversimplifying, then you are saying that your own CPU resources are more limited than your bandwidth, and there is no easy way to transfer the CPU load to the clients where there is an abundance of cycles. If this is an accurate assessment, then it seems you should ask the BOINC-side people if they can improve the client's capabilities. It may also explain some of the capabilities attributed to the multi-project management extensions that I had researched a while back. (When I was studying them, I was sometimes left with the question of "Now why would anyone want to do that?") #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Message boards :
Number crunching :
DNS Problems and Late Work Units
©2024 University of Washington
https://www.bakerlab.org