"Unrecoverable Error" message

Message boards : Rosetta@home Science : "Unrecoverable Error" message

To post messages, you must log in.

AuthorMessage
Jim Allen

Send message
Joined: 10 Dec 05
Posts: 3
Credit: 285
RAC: 0
Message 6353 - Posted: 15 Dec 2005, 20:38:54 UTC

I have been getting the following message fairly frequently: "Unrecoverable error for result 'whatever' exit code 'exit code'. The exit code ends in 000005. Is this fairly common? Since I don't have a manual, and as a newbie on BOINC and this project I haven't found my way around yet, I can't tell.
ID: 6353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JChojnacki
Avatar

Send message
Joined: 17 Sep 05
Posts: 71
Credit: 10,610,702
RAC: 4,450
Message 6387 - Posted: 16 Dec 2005, 1:04:37 UTC - in response to Message 6353.  

I have been getting the following message fairly frequently: "Unrecoverable error for result 'whatever' exit code 'exit code'. The exit code ends in 000005. Is this fairly common? Since I don't have a manual, and as a newbie on BOINC and this project I haven't found my way around yet, I can't tell.


Jim, check out this thread;
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669

It should give you the answer you seek. Plus, the answers to many others questions, you may not even have thought of yet.


ID: 6387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 6428 - Posted: 16 Dec 2005, 12:48:13 UTC

Jim,

Though unofficial and falling behind the BOINC Wiki is still a good source.
ID: 6428 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Allen

Send message
Joined: 10 Dec 05
Posts: 3
Credit: 285
RAC: 0
Message 6456 - Posted: 16 Dec 2005, 18:40:43 UTC - in response to Message 6387.  
Last modified: 16 Dec 2005, 18:41:28 UTC

I have been getting the following message fairly frequently: "Unrecoverable error for result 'whatever' exit code 'exit code'. The exit code ends in 000005. Is this fairly common? Since I don't have a manual, and as a newbie on BOINC and this project I haven't found my way around yet, I can't tell.


Jim, check out this thread;
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669

It should give you the answer you seek. Plus, the answers to many others questions, you may not even have thought of yet.


Thanks. I found it. Have been having some trouble with that error in shutdown. Jim


ID: 6456 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Allen

Send message
Joined: 10 Dec 05
Posts: 3
Credit: 285
RAC: 0
Message 6458 - Posted: 16 Dec 2005, 18:44:09 UTC - in response to Message 6428.  

Jim,

Though unofficial and falling behind the BOINC Wiki is still a good source.


Paul,

Thanks for the info. I will check it out.

Jim

ID: 6458 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kevinsh

Send message
Joined: 8 Dec 05
Posts: 3
Credit: 7,038
RAC: 0
Message 7093 - Posted: 21 Dec 2005, 21:27:46 UTC

Yeah, I'm getting this error all the time. I time share between this project and Predictor@home and it looks like Rosetta just isn't saving it's state information correctly on a task switch or if I suspend processing. Since almost all my Rosetta work units seem to be erroring out, and I'm likely not getting credit for any of them, I've decided to stop running Rosetta altogether for the time being. This is a real shame since this project has a very nice graphic package with it. Below is a sample of the errors I get:

12/19/2005 11:50:59 AM|Predictor @ Home|Pausing result bprion_4_198525_1 (removed from memory)
12/19/2005 11:51:00 AM|rosetta@home|Starting result 1hz6A_topology_sample_190366_0 using rosetta version 480
12/19/2005 11:51:00 AM||request_reschedule_cpus: process exited
12/19/2005 12:17:58 PM||Suspending computation and network activity - user is active
12/19/2005 12:17:58 PM|rosetta@home|Pausing result 1hz6A_topology_sample_190366_0 (removed from memory)
12/19/2005 12:18:01 PM|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_190366_0 ( - exit code -1073741819 (0xc0000005))
12/19/2005 12:18:01 PM||request_reschedule_cpus: process exited
12/19/2005 12:18:01 PM|rosetta@home|Computation for result 1hz6A_topology_sample_190366_0 finished
12/19/2005 3:28:33 PM||Resuming computation and network activity
12/19/2005 3:28:33 PM||request_reschedule_cpus: Resuming activities
12/19/2005 3:28:33 PM|Predictor @ Home|Restarting result bprion_4_198525_1 using mfoldB125 version 428


ID: 7093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kevinsh

Send message
Joined: 8 Dec 05
Posts: 3
Credit: 7,038
RAC: 0
Message 7095 - Posted: 21 Dec 2005, 21:41:39 UTC - in response to Message 7093.  


Mmm, after looking around a bit on these forums I realized something I could try.
My general settings had "Leave applications in memory while preempted?" set to NO. I'll try setting this to YES and see if that stops the errors. I've got plenty of swap space anyway. I'll post later how this worked out.


ID: 7095 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 7096 - Posted: 21 Dec 2005, 21:47:04 UTC - in response to Message 7095.  


Mmm, after looking around a bit on these forums I realized something I could try.
My general settings had "Leave applications in memory while preempted?" set to NO. I'll try setting this to YES and see if that stops the errors. I've got plenty of swap space anyway. I'll post later how this worked out.



Yes, please set it to YES.

There is another problem with a set of Work Units that we sent out. It's causing many of them to crash after about 30 seconds. Details are on this thread.

We are working on this problem, and hope that it will be fixed soon.
ID: 7096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kevinsh

Send message
Joined: 8 Dec 05
Posts: 3
Credit: 7,038
RAC: 0
Message 7113 - Posted: 22 Dec 2005, 0:01:31 UTC

Another example where Rosetta seems to be failing at task switching or in saving it's state information: (As can be seen, projects are now kept in memory)

12/21/2005 2:31:14 PM|Predictor @ Home|Pausing result bprion_5_87210_0 (left in memory)
12/21/2005 2:31:14 PM|rosetta@home|Starting result 1hz6A_topology_sample_207_12057_4 using rosetta version 481
12/21/2005 2:31:34 PM|Predictor @ Home|Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi
12/21/2005 2:31:34 PM|Predictor @ Home|Reason: Requested by user
12/21/2005 2:31:34 PM|Predictor @ Home|Reporting 1 results
12/21/2005 2:31:52 PM|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_207_12057_4 ( - exit code -1073741819 (0xc0000005))
12/21/2005 2:31:52 PM||request_reschedule_cpus: process exited
12/21/2005 2:31:52 PM|rosetta@home|Computation for result 1hz6A_topology_sample_207_12057_4 finished
12/21/2005 2:31:52 PM|Predictor @ Home|Resuming result bprion_5_87210_0 using mfoldB125 version 428

Looks like Ros has downloaded more work and is just beginning to process the WU. Predictor steps in and tries to report its completed WU. (After adjusting my prefs, I'd set a user request for both projects to update.) Rosetta seems to instantly fail due to this interruption. Or perhaps the WU was just bad to begin with?

Hope this info helps someone on the project to pin down the problem. I didn't see a bug reporting system on the Home webpage.
ID: 7113 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7116 - Posted: 22 Dec 2005, 0:22:45 UTC - in response to Message 7113.  

Another example where Rosetta seems to be failing at task switching or in saving it's state information: .... Or perhaps the WU was just bad to begin with?


I think this is an example of the 'fails after 30sec' problem, a known issue with some Rosetta WU at present. If I am right it is pure coincidence that Predictor is uploading, etc. There are other threads describing that issue.
ID: 7116 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stéphane

Send message
Joined: 22 Oct 05
Posts: 1
Credit: 4,739
RAC: 0
Message 7350 - Posted: 23 Dec 2005, 8:15:10 UTC - in response to Message 7116.  

Another example where Rosetta seems to be failing at task switching or in saving it's state information: .... Or perhaps the WU was just bad to begin with?


I think this is an example of the 'fails after 30sec' problem, a known issue with some Rosetta WU at present. If I am right it is pure coincidence that Predictor is uploading, etc. There are other threads describing that issue.


I had the same problem with "normal WU" like 1hz6A_topology_sample_207_7276_6 and also with WU which has DEFAULT_xxxxx_218 name. Example : DEFAULT_2reb_218_7118_0. I see in the news that problem was corrected after 206 generating batch. Is there another naming problem or DEFAULT is correct ?
ID: 7350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 7387 - Posted: 23 Dec 2005, 18:05:50 UTC

Unfortunately, I think that all current Work Units currently have the possibility of crashing after 30 seconds. But these do not need to be aborted.
ID: 7387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : "Unrecoverable Error" message



©2024 University of Washington
https://www.bakerlab.org