Message boards : Number crunching : Statistics for computation errors?
Author | Message |
---|---|
Markus Elfring Send message Joined: 10 Jun 06 Posts: 17 Credit: 3,610,273 RAC: 0 |
I can see in the task list for my PC system that a couple of computations have got the client state "Compute error". I can look into each of them by the results web display. But I find this user interface to find out corresponding error reasons not so convenient as I imagine it could be. The BOINC software has got an infrastructure to generate some statistics. Now I am looking for tools which can visualise the error distribution in an improved way to increase the chances for fixing involved open issues. Is any automatic analysis performed on the returned exit codes within work units? Is an automatic categorisation performed for computation failures so that an efficient drill-down into interesting issues would be supported? How often do you notice error reasons like "Out Of Memory (C++ Exception)" and "Access Violation" at the moment? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Markus, it sounds like you a mixing the ideas of the project team (which needs to gather information across the whole of the project) with a individual user (which needs information about their own machines and potential problems they may be having). Every returned work unit is processed by a validater, which is an application specifically written by each BOINC project to analyze their own returning results. Each BOINC project also tends to have a variety of types of work, each of which might warrant it's own bucket of statistics. On the other hand, some symptoms and messages and outcomes are common to general causes, such as memory exceptions, or common to specific Operating Systems. If some logic to analyze the output in general terms were incorporated into BOINC, all of the projects could benefit from such a thing. The BOINC developers or projects EMail lists would be the best place to discuss such requirements. Rosetta Moderator: Mod.Sense |
Markus Elfring Send message Joined: 10 Jun 06 Posts: 17 Credit: 3,610,273 RAC: 0 |
Markus, it sounds like you a mixing the ideas of the project team (which needs to gather information across the whole of the project) with a individual user (which needs information about their own machines and potential problems they may be having). That is partly the case. - It seems that the processing on my computer was affected by some bad implementations for a task selection. It is hard to find an useful pattern in the variation of error reasons. A few other users gave also feedback on unexpected program behaviour (in the forum here) for Rosetta's applications. I guess that a couple of users would like to see further explanations of observed failure rates. Each BOINC project also tends to have a variety of types of work, each of which might warrant it's own bucket of statistics. I guess that Rosetta researchers and software developers can become overwhelmed by the sheer number of computation errors. Various work results are sent back by a potentially growing number of users and hosts. On the other hand, some symptoms and messages and outcomes are common to general causes, such as memory exceptions, or common to specific Operating Systems. Would you like to publish such details in a public report or issue tracker? If some logic to analyze the output in general terms were incorporated into BOINC, all of the projects could benefit from such a thing. Are you aware of any approaches to improve the software infrastructure for automatic analysis of computation errors? Is anybody besides me interested in a kind of public computation health indicator additional to the existing statistic views? |
Message boards :
Number crunching :
Statistics for computation errors?
©2024 University of Washington
https://www.bakerlab.org