Message boards : Number crunching : Upload errors.
Author | Message |
---|---|
entigy Send message Joined: 2 Nov 05 Posts: 5 Credit: 990,830 RAC: 0 |
This. 05/09/2017 08:22:12 | Rosetta@home | Started upload of rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0 05/09/2017 08:22:12 | Rosetta@home | Started upload of cdfdc_SOL_jumping_SAVE_ALL_OUT_514877_4989_0_r1586011663_0 05/09/2017 08:22:14 | Rosetta@home | [error] Error reported by file upload server: [rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0] locked by file_upload_handler PID=255 05/09/2017 08:22:14 | Rosetta@home | [error] Error reported by file upload server: [cdfdc_SOL_jumping_SAVE_ALL_OUT_514877_4989_0_r1586011663_0] locked by file_upload_handler PID=255 05/09/2017 08:22:14 | Rosetta@home | Temporarily failed upload of rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0: transient upload error 05/09/2017 08:22:14 | Rosetta@home | Backing off 05:09:59 on upload of rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0 05/09/2017 08:22:14 | Rosetta@home | Temporarily failed upload of cdfdc_SOL_jumping_SAVE_ALL_OUT_514877_4989_0_r1586011663_0: transient upload error 05/09/2017 08:22:14 | Rosetta@home | Backing off 04:08:07 on upload of cdfdc_SOL_jumping_SAVE_ALL_OUT_514877_4989_0_r1586011663_0 05/09/2017 08:22:15 | Rosetta@home | Started upload of 95dbf_SOL_jumping_SAVE_ALL_OUT_514877_4988_0_r1180513244_0 05/09/2017 08:22:16 | Rosetta@home | [error] Error reported by file upload server: [95dbf_SOL_jumping_SAVE_ALL_OUT_514877_4988_0_r1180513244_0] locked by file_upload_handler PID=-1 05/09/2017 08:22:16 | Rosetta@home | Temporarily failed upload of 95dbf_SOL_jumping_SAVE_ALL_OUT_514877_4988_0_r1180513244_0: transient upload error 05/09/2017 08:22:16 | Rosetta@home | Backing off 00:16:02 on upload of 95dbf_SOL_jumping_SAVE_ALL_OUT_514877_4988_0_r1180513244_0 |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,677,186 RAC: 4,532 |
Same here. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
[quote]rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0: transient upload error[/quote +1 |
Warped Send message Joined: 15 Jan 06 Posts: 48 Credit: 1,788,185 RAC: 0 |
Same for me. Tue 05 Sep 2017 17:22:55 SAST | Rosetta@home | Started upload of rb_09_03_77151_119988_ab_stage0_t000___robetta_IGNORE_THE_REST_05_09_514908_71_0_r1550815990_0 Tue 05 Sep 2017 17:22:59 SAST | Rosetta@home | [error] Error reported by file upload server: [rb_09_03_77151_119988_ab_stage0_t000___robetta_IGNORE_THE_REST_05_09_514908_71_0_r1550815990_0] locked by file_upload_handler PID=255 Tue 05 Sep 2017 17:22:59 SAST | Rosetta@home | Temporarily failed upload of rb_09_03_77151_119988_ab_stage0_t000___robetta_IGNORE_THE_REST_05_09_514908_71_0_r1550815990_0: transient upload error Tue 05 Sep 2017 17:22:59 SAST | Rosetta@home | Backing off 04:48:33 on upload of rb_09_03_77151_119988_ab_stage0_t000___robetta_IGNORE_THE_REST_05_09_514908_71_0_r1550815990_0 |
JohnH Send message Joined: 25 Mar 13 Posts: 43 Credit: 2,319,355 RAC: 0 |
Me too 9/5/2017 6:55:09 PM | Rosetta@home | [error] Error reported by file upload server: [6ea159939ab8b54604c3f5fbeadf8c01_C2_docking_big_job_17_08_03_12_20_globalDocking_0_SAVE_ALL_OUT_510695_8_0_r1334300603_0] locked by file_upload_handler PID=255 9/5/2017 6:55:09 PM | Rosetta@home | Temporarily failed upload of 6ea159939ab8b54604c3f5fbeadf8c01_C2_docking_big_job_17_08_03_12_20_globalDocking_0_SAVE_ALL_OUT_510695_8_0_r1334300603_0: transient upload error 9/5/2017 6:55:09 PM | Rosetta@home | Backing off 05:19:22 on upload of 6ea159939ab8b54604c3f5fbeadf8c01_C2_docking_big_job_17_08_03_12_20_globalDocking_0_SAVE_ALL_OUT_510695_8_0_r1334300603_0 |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
I have four of them on two machines, making a total of 8 tasks (AKA units) stuck in the "Uploading" status on the Tasks tab. I didn't see how the second machine got there, but on the first machine there was definitely an interval when only three units were stuck, and the fourth was added later. On the machine I can see now, all four of them are from different projects, with completion times from 4 to 8 hours. On the Transfers tab, the "Upload: retry in ..." times vary from 2 to 5 hours. Using the Retry Now button individually or collectively fails after about 2 seconds of "active" status. Not a new problem, but I think this is the first time I've seen it since the major server upgrade a few weeks back. There was at least one other peculiar behavior, but since the one I can recall right now involves the arbitrary and meaningless deadlines, I file it under "C'est la vie." At least the deadlines continue to appear arbitrary and without meaning from my perspective as a volunteer or donor... Their only significance is the demotivating feeling of well-intended contributions tossed in the bit bucket, which may happen to these frozen-in-Uploading units, too. Sometimes I feel like instead of saying "C'est la vie" I should be saying "Cela signifie la guerre!" (At least in this case I understand the circumstances which caused the deadlines to be missed, so I can basically dismiss the lost hours as a one-time failure affecting 2.5 machines.) Ah. Just finished another task (AKA work unit) on this machine, and it went to the "Ready to report" status. Going to the Projects tab and clicking Update works as expected. The newly finished task disappears, and the four "Uploading" tasks remain unaffected. Clicking on Retry Now from the Transfers tab fails. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Now up to 5 "Uploading" tasks frozen on this machine with total work time over 30 hours. Doesn't appear to be an immediate threat of lost credit, since the earliest deadline is the 11th, but looking at the other machine... It has also increased to 5 making a total of 10 jammed units. Checked two other machines at hand, and they don't have any yet. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
Maybe a restart of server's daemons..... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
On the Transfers tab, the "Upload: retry in ..." times vary from 2 to 5 hours. Using the Retry Now button individually or collectively fails after about 2 seconds of "active" status. Same. If I remember correctly, a solution was found last time on the server side - that shouldn't have solved anything, but it did. If someone searches for "transient" I think that solution will be found. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Well, I can report that it has spread to two more computers. Since I had changed my network configuration recently, I went ahead and tested one machine with an alternate routing. Much slower connection, but no apparent effects on the problem. There seems to be some inconsistency in how quickly a "Retry Now" from the Transfer tab fails. On this machine, it goes back to the "Upload: retry in ..." status quite quickly, just a few seconds. Other machines remain in "Upload: active" status for a long time. Quite annoying, but not surprising or anything... Which leads to the long history of project struggles. I do recall something about a network security configuration problem at the Baker Lab sign. If this is similar, then it took them about a week to communicate the nature of the problem to the university's network people and get the fix, whatever it was. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
JohnH Send message Joined: 25 Mar 13 Posts: 43 Credit: 2,319,355 RAC: 0 |
All seems to be working for me this morning. Units finished and retry queued on Sep 5th were credited on the 6th. One way arrow of time strikes again ... oh well. Anybody know that happened/resolution? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
All seems to be working for me this morning. Mine cleared up at almost exactly the same time - 2 minutes prior to this post |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
"Which leads to the long history of project struggles." We've been up since late June with the new server front and back ends without any significant issues (knock on wood). This has been the first significant issue since and may or may not have been related to network and power instability here at the UW recently. The file locking logic in the upload handler started to fail for the majority of upload requests. We rebooted the web servers and filesystem but that didn't fix the issue. We had to modify the source code to comment out the file locking logic and rebuild the upload handler. This appears to have fixed the issue. The file locking logic is not necessary for our system and things appear to be back to normal. On a positive note, I think our project has a long history of success including research from our lab being runner-up to Science magazine's breakthrough of 2016 for protein design, success in using co-evolution sequence data from meta-genomes to determine new protein structures at a cost significantly less than structural genomics initiatives, and designing/modeling small cyclic peptides with non-canonical amino acids (much of which was modeled on mobile android devices), and more. |
JohnH Send message Joined: 25 Mar 13 Posts: 43 Credit: 2,319,355 RAC: 0 |
Thanks for the update. :-) |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
The quote sounds like one of my asides... Anyway, all seems back to normal and mostly I don't care much these days. I continue to bubble with imaginary constructive suggestions and continue to feel the world at large will do what it darn well pleases. Just saw another one implemented today about 30 years after I first wrote about it... (If my memory wasn't so darned selective I would be better at remembering all my erroneous ideas, too.) #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Message boards :
Number crunching :
Upload errors.
©2024 University of Washington
https://www.bakerlab.org