Message boards : Number crunching : Cant Upload
Author | Message |
---|---|
shimp Send message Joined: 4 May 06 Posts: 7 Credit: 329,810 RAC: 0 |
I receive this message; 1/9/2011 7:52:18 AM rosetta@home [error] Error reported by file upload server: [mem_prub_run05_centroid_round03_A_subrun_007542_SAVE_ALL_OUT_IGNORE_THE_REST_22824_29_0_0] locked by file_upload_handler PID=-1 shimp |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
There are several threads about this already with other users experiencing similar problems. The most likely reason is that it is an ongoing server problem caused by the recent crash and that it will fix itself in time. Normally your results are valuable to the project team even if they get reported late, but keep an eye on the other threads in this forum to see if different advice is given in this case. |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Murasaki previously said: ... problem caused by the recent crash and that it will fix itself in time Damn, I wonder if they are going to market this new-fangled "fix itself" computer of which you speak ... |
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
I'm not very hopeful for the "fix itself" plan either. With 50 wu retrying on avg every 2 hours, that's 25 attempts per hour. At even a 1% sucess rate (if the problem is just capacity overload) by virtue of random chance, that's 1 upload success in 4 hours, avg. I've seen 0 success total for a couple of days now. That's not even .1% success. Just doesn't seem likely - seems like a brick wall. The failure comes right back in 3 seconds, so it's not like somebody is too busy to respond or timing out. It seems like we're really contacting the scheduler and he says "I don't know how to do upload, go away..." Anybody *really* know what the file upload handler is? I assume it's a process on the scheduler server that takes in upload files, but I'm just guessing. Any chance it is some sort of reference contained in the upload files (the wu) itself? I'm not yet at the point of deleting all this work to find out, so it's wait and see (even if it is self fixing, or otherwise). oh well..... Dave |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Yes Mickey, the upload handler is a separate server process from the scheduler. And yes, there is a reference in the tasks which tell them where to report the completed results. Since the disk drives used to store the uploads is what the failed "file server" is, the Project Team "may" (I can only speculate as well) have temporarily stopped the upload handler until it has a reliable place to store the data it receives. Since the data in your tasks referencing the upload server is accurate, there is no sense aborting the uploads or tasks. The BOINC Manager will retry on it's own and once the servers are fully operational again the problems will clear up. I believe the prior reference here to "fix itself" was intended to refer to the retries and recoveries built in to the BOINC Manager. As in "once the server issues are resolved, the normal processes on the client side will clear up the backlogs, so there is nothing you must do on the client end to resolve the problems". Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Cant Upload
©2024 University of Washington
https://www.bakerlab.org