Message boards : Number crunching : These 7 files will not upload.
Author | Message |
---|---|
D.A. Pinniger Send message Joined: 26 Jan 11 Posts: 2 Credit: 1,027,601 RAC: 0 |
Are due 1-17-2012. Any help to resolve this issue would be greatly appreciated. Thank you. rosetta@home ab_11_29__optpps_T5781_optpps_03_09_35686_149833_0_0 0.000 45.11 K 00:00:34 - 65:51:13 0.00 Kbps Upload pending (Project backoff: 00:04:36) Dave rosetta@home ab_11_29__optpps_T6211_optpps_03_09_35686_150999_0_0 0.000 40.13 K 00:00:34 - 70:45:04 0.00 Kbps Upload pending (Project backoff: 00:04:36) Dave rosetta@home ab_11_29__optpps_T5781_optpps_03_09_35686_149852_0_0 0.000 44.99 K 00:00:29 - 66:32:40 0.00 Kbps Upload pending (Project backoff: 00:04:36) Dave rosetta@home ab_11_29__optpps_T6211_optpps_03_09_35686_150993_0_0 0.000 40.20 K 00:00:27 - 67:37:49 0.00 Kbps Upload pending (Project backoff: 00:04:36) Dave rosetta@home ab_11_29__optpps_T6211_optpps_03_09_35686_151002_0_0 0.000 45.72 K 00:00:21 - 69:54:28 0.00 Kbps Upload pending (Project backoff: 00:04:36) Dave rosetta@home ab_11_29__optpps_T5781_optpps_03_09_35686_149834_0_0 0.000 45.47 K 00:00:23 - 64:57:58 0.00 Kbps Upload pending (Project backoff: 00:04:36) Dave rosetta@home ab_11_29__optpps_T6211_optpps_03_09_35686_150998_0_0 0.000 45.72 K 00:00:20 - 69:21:54 0.00 Kbps Upload pending (Project backoff: 00:04:36) Dave |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 9,592 |
The servers will be swamped with uploads - it'll just take some time. |
Jesse Viviano Send message Joined: 14 Jan 10 Posts: 42 Credit: 2,700,472 RAC: 0 |
I am having a similar problem with one file that will not upload, but when I check the log, it statest that my machine cannot resolve the DNS address of the upload server. Is there a DNS misconfiguration? I was able to upload two other files and report them successfully, so I am guessing that there might be multiple upload servers with one of them having a bad DNS configuration at the DNS server. |
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
I think I conclude what the OP and Jesse did - there is a set of 13 results waiting to upload, and they always fail with can't resolve. Subsequent WU's process, upload, and report, and these 13 are stuck in the past, unable to move on. Does a rah wu upload file contain something that would maybe cause it to try to go to an obsolete host name, after the reconfig got done at UW? Like are they coded with a host name that now does not exist, or is not in DNS servers any where? Does anyone know if there is a CC debug switch we could turn on to see exactly what host name the failed units are attempting to use? Dave |
Holmis Send message Joined: 15 Nov 07 Posts: 6 Credit: 975,490 RAC: 0 |
I think I conclude what the OP and Jesse did - there is a set of 13 results waiting to upload, and they always fail with can't resolve. Subsequent WU's process, upload, and report, and these 13 are stuck in the past, unable to move on. Hi I've also got one file that fails to upload and it's trying this URL: http://srv6.bakerlab.org/rosetta_cgi/file_upload_handler My cc_config.xml looks like this: <cc_config> <log_flags> <cpu_sched>1</cpu_sched> <cpu_sched_debug>0</cpu_sched_debug> <dcf_debug>1</dcf_debug> <sched_op_debug>1</sched_op_debug> <file_xfer_debug>1</file_xfer_debug> </log_flags> <options> <zero_debts>0</zero_debts> </options> </cc_config> I think that "file_xfer_debug" is the one you want. This is what I see in the eventlog (or message tab): 07/01/2012 23:47:50 | rosetta@home | [fxd] starting upload, upload_offset -1 07/01/2012 23:47:50 | rosetta@home | Started upload of ab_11_29__optpps_T5781_optpps_03_09_35686_134357_0_0 07/01/2012 23:47:50 | rosetta@home | [file_xfer] URL: http://srv6.bakerlab.org/rosetta_cgi/file_upload_handler 07/01/2012 23:47:51 | | Project communication failed: attempting access to reference site 07/01/2012 23:47:51 | rosetta@home | [file_xfer] http op done; retval -113 (can't resolve hostname) 07/01/2012 23:47:51 | rosetta@home | [file_xfer] file transfer status -113 (can't resolve hostname) 07/01/2012 23:47:51 | rosetta@home | Temporarily failed upload of ab_11_29__optpps_T5781_optpps_03_09_35686_134357_0_0: can't resolve hostname 07/01/2012 23:47:51 | rosetta@home | Backing off 11 hr 38 min 16 sec on upload of ab_11_29__optpps_T5781_optpps_03_09_35686_134357_0_0 When one gets a task assigned the server sends information to the client on where to upload the result, that information is stored in the file client_state.xml. It seems that different tasks have different addresses for upload, at least that's what I see in my state file. If you open your client_state.xml do take care not to save any changes as it my very well cause you to lose all work or worse... /Johan |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
I have several that don't upload on different machines. One gives this message: 1/8/2012 1:20:56 AM rosetta@home Temporarily failed upload of ab_11_29__optpps_T5781_optpps_03_09_35686_181530_0_0: can't resolve hostname I guess its a thing the admins need to solve. Greetings, TJ. |
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
very interesting. when I turn on that debug, I see it using the same url as Holmis. And when I went to ping it, it returned as C:Usersdwmickey>ping srv6.bakerlab.org Pinging srv6.bakerlab.org [67.215.65.132] with 32 bytes of data: Reply from 67.215.65.132: bytes=32 time=36ms TTL=53 Reply from 67.215.65.132: bytes=32 time=34ms TTL=53 Reply from 67.215.65.132: bytes=32 time=34ms TTL=53 Reply from 67.215.65.132: bytes=32 time=33ms TTL=53 Ping statistics for 67.215.65.132: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 33ms, Maximum = 36ms, Average = 34ms BUT THEN, I did a flushdns via ipconfig, and it no longer returned. Now, I see that the IP that was returning above is now, really, C:Usersdwmickey>tracert 67.215.65.132 Tracing route to hit-nxdomain.opendns.com [67.215.65.132] over a maximum of 30 hops: It looks like SRV6 is no more, in dns land. and if you look at the rah server status page, it looks like it should all be going thru srv4. And indeed, looking at the file debug for uploads that work then are asking to go to : ///////////////////////////////////////////////////// 07-Jan-2012 16:50:59 [rosetta@home] Started upload of _11_29__optpps_T6161_optpps_03_09_35686_140788_0_0 07-Jan-2012 16:50:59 [rosetta@home] [file_xfer_debug] URL: http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler ////////////////////////////////////////////////////// and that works....... So what's to become of these units, apparently cast aside by the server reconfig......??? They want to phone home to srv6, but alas, there is none! Dave |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
and that works....... Hi Dave, thanks for your explanetion. Is there a way we can solve the problem ourselves? Greetings, TJ. |
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
Apparently, yes, there is. put this line in your "hosts" file - somewhere under windows, just "hosts", with no extension: 128.95.160.145 srv6.bakerlab.org just by itself. Then, requests to srv6 will go to srv4. All mine are gone now...... Dave |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Apparently, yes, there is. put this line in your "hosts" file - somewhere under windows, just "hosts", with no extension: Hi Dave, thanks but I am not good with software. Where can I find this "host" file? I did a search on the harddisk with no useable result. Can and will you help a little more? Thanks, TJ |
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
Where it is might be variable based on your Windows rev, but I have windows 7, and mine is found in the directory named: c:windowssystem32driversetc the filename is hosts with no extension like .txt or .bin or anything. It is plain text, so you can open it with notepad, or any simple text editor. In a search or find, look for name hosts. put the line I quoted before, by itself on the last line of the file, not disturbing any other lines. If you want, you can put a comment line above your new line for later explanation, like: # this entry is to fix a problem with rosetta 128.95.160.145 srv6.bakerlab.org There is one or more spaces or tabs between the .145 and srv6, and finish the line with a RETURN Save the file. Now, go restart the upload of the stuck file(s). If your problem is the same as mine, they will work now. Dave If you're nervous about fooling with the file, copy it before editing, and then you can easily put it back the way you found it. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Where it is might be variable based on your Windows rev, but Thank you Dave. It is a clear instruction. Great work! I have win7 as well and now my wu's are gone. Greetings, TJ. |
edikl Send message Joined: 16 Jun 10 Posts: 10 Credit: 186,187 RAC: 0 |
Hi! I can confirm that your advice works perfectly under Windows Vista as well. Thanks a lot :) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I would suggest that it is likely that this one upload server simply has not completed it's upgrade yet, and that it will come back online soon. And therefore no action is required. When the server comes back online, BOINC will finally have a retry that works. To do other things risks corrupting files which potentially effects your whole boat of tasks. Aborting the transfer will be throwing away the work you've done, and the credit you've earned for that work. ...having said that, the suggestion below to hit an alternate upload server should be processed normally if you are comfortable achieving the redirection via the hosts file, etc. Rosetta Moderator: Mod.Sense |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
I would suggest that it is likely that this one upload server simply has not completed it's upgrade yet, and that it will come back online soon. And therefore no action is required. When the server comes back online, BOINC will finally have a retry that works. Thanks Mod.Sense, It would have helped a lot when you mentioned this earlier, even better when it is on the main page. As you can imagine a lot of crunchers have this issue. Greetings, TJ. |
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
I think that you are asking for validate errors with this method because it is quite possible that the server that a work unit is assigned to is the only one that can validate it. ..... To do other things risks corrupting files which potentially effects your whole boat of tasks. Aborting the transfer will be throwing away the work you've done, and the credit you've earned for that work. ...having said that, the suggestion below to hit an alternate upload server should be processed normally if you are comfortable achieving the redirection via the hosts file, etc. I looked for, and found my 13 holdouts, all reported at just about 1:00 UTC, and they all are "Over, Success, Done, and granted credit". So maybe I dodged a bullet, but I would guess that a robust parallel system like boinc would not have fragile path such as work having to go back into one and only one IP address. But I can't claim any expertise, just luck, I guess. But, from ModSenses comment, there is no "etc", it was just mod the host file. Period. typical result record: 473457597 431993511 28 Dec 2011 17:41:51 UTC 8 Jan 2012 0:57:37 UTC Over Success Done 27,876.58 198.47 154.33 YMMV, I guess, but no sign of trouble here. Dave |
ukjohnd Send message Joined: 22 Jul 06 Posts: 1 Credit: 696,728 RAC: 0 |
Perfect fixed DNS issues for me |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
I had this same problem with my Ubuntu Linux pc, I have been hitting the `update` and `retry now` buttons plenty this last day, with no affect. Though the fix did not need any hosts config fun. In the end just restart the computer did it, (which is something i only usualy do for a kernel update). Then hit the `retry now` button for each task, Everything uploaded, and new work is downloading. Sorted :¬) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,158,554 RAC: 15,699 |
After 2 days of failed upload attempts on WUs due 11th & 12th I used this solution & it worked. I removed the line from HOSTS straight after as no other WUs had this problem. Thanks for details of the workaround - much appreciated. |
Ironworker16 Send message Joined: 31 May 06 Posts: 3 Credit: 9,758,247 RAC: 0 |
Where it is might be variable based on your Windows rev, but Thanks, That was quick and easy. I did have to open notepad as administrator to edit the file. |
Message boards :
Number crunching :
These 7 files will not upload.
©2024 University of Washington
https://www.bakerlab.org