Message boards : Number crunching : Client errors on linux
Author | Message |
---|---|
B.Rothbaecher Send message Joined: 2 Jun 12 Posts: 7 Credit: 1,125,281 RAC: 0 |
After I installed Linux on my PC (Suse 12.2) I got o lot of client errors: https://boinc.bakerlab.org/rosetta/result.php?resultid=585131437 https://boinc.bakerlab.org/rosetta/result.php?resultid=585127003 https://boinc.bakerlab.org/rosetta/result.php?resultid=585112922 https://boinc.bakerlab.org/rosetta/result.php?resultid=585112920 What can I Do? Thank You Bruno |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Wow! 32 CPUs and 64GB of memory! It looks like those tasks were taking longer than expected to completed and ended by the watch-dog. Not much on your side you can do, except perhaps bump up your target run time. This would let the task try running longer before cutting it off. Be very careful when changing target runtime. All of your currently downloaded WUs will get bumped up to the longer runtime and with 32 CPUs worth of work cached up, that can easily lead to having too much work on hand. I always suggest making such changes with a very small cache of work on hand. And only moving up one notch per day, so BOINC Manager begins to have estimated runtimes for new tasks that are inline with your current runtime preference. Rosetta Moderator: Mod.Sense |
B.Rothbaecher Send message Joined: 2 Jun 12 Posts: 7 Credit: 1,125,281 RAC: 0 |
Increasing the target run time will not be a problem. I have only a small cache of a half day (.25 + .25 in settings). What target run time do you recommend? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I always go for the 24hr runtime preference. This will be the lowest overall bandwidth used for you, and the least overall hit to the Rosetta servers to keep your machine busy crunching. Also, you can save some bandwidth if you have a cacheing proxy server. Just ask if you'd like to know more. Do you have other machines crunching in the same local network? So, if you currently run with the default 3 hrs preference, and have a .25 + .25 day cache (or one half day), then bumping that to 12 hrs would result in your current downloaded cache of work going from a one half day cache to being 2 days of work. While BOINC Manager is running the first few tasks under the new preference it will misestimate the completion times and the % complete may seem a bit confusing. But you can tell BOINC Manager is getting used to the idea of the new runtime when you see the estimated runtime of a task that has not yet begun approaches your preference setting. So, I'd suggest bumping to 12hrs, crunch for a day or two, then bump it to 24hrs. Then wait another 2 days before you question too closely how much work you are downloading and have on hand. Rosetta Moderator: Mod.Sense |
B.Rothbaecher Send message Joined: 2 Jun 12 Posts: 7 Credit: 1,125,281 RAC: 0 |
I changed the runtime to 24hrs. That is not the solution. Some WUs ended with an error: https://boinc.bakerlab.org/rosetta/result.php?resultid=586146582 https://boinc.bakerlab.org/rosetta/result.php?resultid=586128913 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Well, if a given type of task is having long-running models, there are still cases where the target runtime is exceeded by more than 4hrs and so the watchdog kicks in. It becomes a random event as to whether the last model run is one that is long-running or not. But when the watchdog ends a task, it preserves all of the prior work. So, I didn't intend that a change of runtime preference would be a solution, but it should reduce the frequency of errors, and will help maintain your environment with less bandwidth. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Client errors on linux
©2024 University of Washington
https://www.bakerlab.org