Statistics for computation errors?

Message boards : Number crunching : Statistics for computation errors?

To post messages, you must log in.

AuthorMessage
Markus Elfring

Send message
Joined: 10 Jun 06
Posts: 17
Credit: 3,610,273
RAC: 0
Message 71636 - Posted: 24 Nov 2011, 13:30:19 UTC

I can see in the task list for my PC system that a couple of computations have got the client state "Compute error". I can look into each of them by the results web display. But I find this user interface to find out corresponding error reasons not so convenient as I imagine it could be.

The BOINC software has got an infrastructure to generate some statistics. Now I am looking for tools which can visualise the error distribution in an improved way to increase the chances for fixing involved open issues.

Is any automatic analysis performed on the returned exit codes within work units?
Is an automatic categorisation performed for computation failures so that an efficient drill-down into interesting issues would be supported?

How often do you notice error reasons like "Out Of Memory (C++ Exception)" and "Access Violation" at the moment?
ID: 71636 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 71639 - Posted: 25 Nov 2011, 17:17:05 UTC

Markus, it sounds like you a mixing the ideas of the project team (which needs to gather information across the whole of the project) with a individual user (which needs information about their own machines and potential problems they may be having).

Every returned work unit is processed by a validater, which is an application specifically written by each BOINC project to analyze their own returning results. Each BOINC project also tends to have a variety of types of work, each of which might warrant it's own bucket of statistics.

On the other hand, some symptoms and messages and outcomes are common to general causes, such as memory exceptions, or common to specific Operating Systems. If some logic to analyze the output in general terms were incorporated into BOINC, all of the projects could benefit from such a thing. The BOINC developers or projects EMail lists would be the best place to discuss such requirements.
Rosetta Moderator: Mod.Sense
ID: 71639 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Markus Elfring

Send message
Joined: 10 Jun 06
Posts: 17
Credit: 3,610,273
RAC: 0
Message 71759 - Posted: 6 Dec 2011, 21:32:47 UTC - in response to Message 71639.  

Markus, it sounds like you a mixing the ideas of the project team (which needs to gather information across the whole of the project) with a individual user (which needs information about their own machines and potential problems they may be having).

That is partly the case. - It seems that the processing on my computer was affected by some bad implementations for a task selection. It is hard to find an useful pattern in the variation of error reasons.
A few other users gave also feedback on unexpected program behaviour (in the forum here) for Rosetta's applications. I guess that a couple of users would like to see further explanations of observed failure rates.

Each BOINC project also tends to have a variety of types of work, each of which might warrant it's own bucket of statistics.

I guess that Rosetta researchers and software developers can become overwhelmed by the sheer number of computation errors. Various work results are sent back by a potentially growing number of users and hosts.

On the other hand, some symptoms and messages and outcomes are common to general causes, such as memory exceptions, or common to specific Operating Systems.

Would you like to publish such details in a public report or issue tracker?

If some logic to analyze the output in general terms were incorporated into BOINC, all of the projects could benefit from such a thing.

Are you aware of any approaches to improve the software infrastructure for automatic analysis of computation errors?

Is anybody besides me interested in a kind of public computation health indicator additional to the existing statistic views?
ID: 71759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Statistics for computation errors?



©2024 University of Washington
https://www.bakerlab.org