Client errors on linux

Message boards : Number crunching : Client errors on linux

To post messages, you must log in.

AuthorMessage
B.Rothbaecher

Send message
Joined: 2 Jun 12
Posts: 7
Credit: 1,125,281
RAC: 0
Message 75693 - Posted: 3 Jun 2013, 19:28:27 UTC

After I installed Linux on my PC (Suse 12.2) I got o lot of client errors:
https://boinc.bakerlab.org/rosetta/result.php?resultid=585131437
https://boinc.bakerlab.org/rosetta/result.php?resultid=585127003
https://boinc.bakerlab.org/rosetta/result.php?resultid=585112922
https://boinc.bakerlab.org/rosetta/result.php?resultid=585112920

What can I Do?

Thank You

Bruno


ID: 75693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75698 - Posted: 4 Jun 2013, 16:00:08 UTC

Wow! 32 CPUs and 64GB of memory! It looks like those tasks were taking longer than expected to completed and ended by the watch-dog. Not much on your side you can do, except perhaps bump up your target run time. This would let the task try running longer before cutting it off.

Be very careful when changing target runtime. All of your currently downloaded WUs will get bumped up to the longer runtime and with 32 CPUs worth of work cached up, that can easily lead to having too much work on hand. I always suggest making such changes with a very small cache of work on hand. And only moving up one notch per day, so BOINC Manager begins to have estimated runtimes for new tasks that are inline with your current runtime preference.
Rosetta Moderator: Mod.Sense
ID: 75698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
B.Rothbaecher

Send message
Joined: 2 Jun 12
Posts: 7
Credit: 1,125,281
RAC: 0
Message 75700 - Posted: 5 Jun 2013, 2:14:45 UTC

Increasing the target run time will not be a problem. I have only a small cache of a half day (.25 + .25 in settings).

What target run time do you recommend?
ID: 75700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75716 - Posted: 6 Jun 2013, 13:38:46 UTC

I always go for the 24hr runtime preference. This will be the lowest overall bandwidth used for you, and the least overall hit to the Rosetta servers to keep your machine busy crunching. Also, you can save some bandwidth if you have a cacheing proxy server. Just ask if you'd like to know more. Do you have other machines crunching in the same local network?

So, if you currently run with the default 3 hrs preference, and have a .25 + .25 day cache (or one half day), then bumping that to 12 hrs would result in your current downloaded cache of work going from a one half day cache to being 2 days of work. While BOINC Manager is running the first few tasks under the new preference it will misestimate the completion times and the % complete may seem a bit confusing. But you can tell BOINC Manager is getting used to the idea of the new runtime when you see the estimated runtime of a task that has not yet begun approaches your preference setting.

So, I'd suggest bumping to 12hrs, crunch for a day or two, then bump it to 24hrs. Then wait another 2 days before you question too closely how much work you are downloading and have on hand.
Rosetta Moderator: Mod.Sense
ID: 75716 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
B.Rothbaecher

Send message
Joined: 2 Jun 12
Posts: 7
Credit: 1,125,281
RAC: 0
Message 75734 - Posted: 10 Jun 2013, 4:51:46 UTC
Last modified: 10 Jun 2013, 4:53:43 UTC

I changed the runtime to 24hrs.
That is not the solution.
Some WUs ended with an error:
https://boinc.bakerlab.org/rosetta/result.php?resultid=586146582
https://boinc.bakerlab.org/rosetta/result.php?resultid=586128913
ID: 75734 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75741 - Posted: 10 Jun 2013, 20:03:41 UTC

Well, if a given type of task is having long-running models, there are still cases where the target runtime is exceeded by more than 4hrs and so the watchdog kicks in. It becomes a random event as to whether the last model run is one that is long-running or not. But when the watchdog ends a task, it preserves all of the prior work.

So, I didn't intend that a change of runtime preference would be a solution, but it should reduce the frequency of errors, and will help maintain your environment with less bandwidth.
Rosetta Moderator: Mod.Sense
ID: 75741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Client errors on linux



©2024 University of Washington
https://www.bakerlab.org