Cant Upload

Message boards : Number crunching : Cant Upload

To post messages, you must log in.

AuthorMessage
shimp
Avatar

Send message
Joined: 4 May 06
Posts: 7
Credit: 329,810
RAC: 0
Message 69066 - Posted: 9 Jan 2011, 14:11:42 UTC

I receive this message;
1/9/2011 7:52:18 AM rosetta@home [error] Error reported by file upload server: [mem_prub_run05_centroid_round03_A_subrun_007542_SAVE_ALL_OUT_IGNORE_THE_REST_22824_29_0_0] locked by file_upload_handler PID=-1


shimp
ID: 69066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 69072 - Posted: 9 Jan 2011, 15:14:31 UTC

There are several threads about this already with other users experiencing similar problems. The most likely reason is that it is an ongoing server problem caused by the recent crash and that it will fix itself in time.

Normally your results are valuable to the project team even if they get reported late, but keep an eye on the other threads in this forum to see if different advice is given in this case.
ID: 69072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 69076 - Posted: 9 Jan 2011, 15:44:09 UTC - in response to Message 69072.  

Murasaki previously said:

... problem caused by the recent crash and that it will fix itself in time


Damn, I wonder if they are going to market this new-fangled "fix itself" computer of which you speak ...
ID: 69076 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dave Mickey

Send message
Joined: 29 Dec 07
Posts: 33
Credit: 4,136,957
RAC: 0
Message 69084 - Posted: 9 Jan 2011, 17:12:44 UTC

I'm not very hopeful for the "fix itself" plan either. With 50 wu retrying on avg every 2 hours, that's 25 attempts per hour. At even a 1% sucess rate (if the problem is just capacity overload) by virtue of random chance, that's 1 upload success in 4 hours, avg. I've seen 0 success total for a couple of days now. That's not even .1% success. Just doesn't seem likely - seems like a brick wall. The failure comes right back in 3 seconds, so it's not like somebody is too busy to respond or timing out. It seems like we're really contacting the scheduler and he says "I don't know how to do upload, go away..."

Anybody *really* know what the file upload handler is? I assume it's a process on the scheduler server that takes in upload files, but I'm just guessing. Any chance it is some sort of reference contained in the upload files (the wu) itself? I'm not yet at the point of deleting all this work to find out, so it's wait and see (even if it is self fixing, or otherwise).

oh well.....

Dave
ID: 69084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 69089 - Posted: 9 Jan 2011, 18:56:42 UTC

Yes Mickey, the upload handler is a separate server process from the scheduler. And yes, there is a reference in the tasks which tell them where to report the completed results. Since the disk drives used to store the uploads is what the failed "file server" is, the Project Team "may" (I can only speculate as well) have temporarily stopped the upload handler until it has a reliable place to store the data it receives. Since the data in your tasks referencing the upload server is accurate, there is no sense aborting the uploads or tasks.

The BOINC Manager will retry on it's own and once the servers are fully operational again the problems will clear up. I believe the prior reference here to "fix itself" was intended to refer to the retries and recoveries built in to the BOINC Manager. As in "once the server issues are resolved, the normal processes on the client side will clear up the backlogs, so there is nothing you must do on the client end to resolve the problems".
Rosetta Moderator: Mod.Sense
ID: 69089 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Cant Upload



©2024 University of Washington
https://www.bakerlab.org