Message boards : Number crunching : Validator down... :-(
Author | Message |
---|---|
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
Well, never a dull moment... Does anyone know what the issue is here or is this (just) another "it's weekend and no sysadmin is around" kind of typical R@H thing again? :-( Ralf |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
Well, never a dull moment... are the work units we do wasted until this is fixed?? |
Jesse Viviano Send message Joined: 14 Jan 10 Posts: 42 Credit: 2,700,472 RAC: 0 |
Well, never a dull moment... No. They are not wasted. They will simply build a backlog for the validator to process once it is restarted. |
Jesse Viviano Send message Joined: 14 Jan 10 Posts: 42 Credit: 2,700,472 RAC: 0 |
Well, never a dull moment... I would have to guess that server bk1 either has gone down or failed. All of the processes that are on bk1 are down, but the processes on the other servers are up. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
Well, that's a very obvious guess...Well, never a dull moment... Problem is now that not only the validator doesn't run but that you can not upload finished WU's either... :-( And of course this all just happens to happen when I added another workstation back to crunching for R@H... :? Ralf |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,158,554 RAC: 15,699 |
Well, that's a very obvious guess...Well, never a dull moment... Is that the case for you? Everything's uploaded here and new downloads coming down too. Just awaiting validation - I think for 14 hours. Because rah_validator_beta appears to be running on server bk2, does that mean that some WUs are being validated - just too slowly? |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
1 WU made it mysteriously past the uploading and joined the rest of the previous WU waiting for validation. At least the laptop where I had added R@H again after it "seemed" that things are working ok for a few weeks has two previous finished WU's still sitting as "uploading"...Problem is now that not only the validator doesn't run but that you can not upload finished WU's either... :-( Get the occasional message that "there is no active Internet connection" (which is absolute bullcrap) now on that one too. Have stopped R@H from receiving new WU's and added WCG instead. Will see what the R@H WUs currently running will do when they finish in about an hour, tried already to reboot to no avail... Ralf |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
At least the laptop where I had added R@H again after it "seemed" that things are working ok for a few weeks has two previous finished WU's still sitting as "uploading"...Subsequent WU's finshed and uploaded fine, but those two just wont budge... :-( In the meantime, the pending list keeps growing... :-( Ralf |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,158,554 RAC: 15,699 |
1 WU made it mysteriously past the uploading and joined the rest of the previous WU waiting for validation. At least the laptop where I had added R@H again after it "seemed" that things are working ok for a few weeks has two previous finished WU's still sitting as "uploading"...Problem is now that not only the validator doesn't run but that you can not upload finished WU's either... :-( About 8 hours after my previous message my uploads started having a problem too. A manual update got 3 out of 5 to upload but the other 2 wouldn't shift. Receiving new WUs seems fine. There's no point switching to another project for me yet as crunching is fine & points are just saved up for later, not lost. I still have debt to Rosetta to catch up from several months ago - don't think I've touched WCG on this account for 3+ months. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Ten hour`s ago the validator was not running and i had eight work units pending, these have now been validated, so something worked during the night. though i still see plenty of red on the server status page !!. I think now that SETI has got rid of it`s latest gremlins they have set up home@Rosetta :-) It will get fixed. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,158,554 RAC: 15,699 |
All my uploads have gone through and all previous uploads have been validated now. Typically, the server status page shows bk1 is still down, but that's about par for the course. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
All my uploads have gone through and all previous uploads have been validated now.The two WUs stuck on uploading finally moved and a couple of WUs have been validated over night, but most of them still show "pending", so not much news on this end... Ralf |
Jesse Viviano Send message Joined: 14 Jan 10 Posts: 42 Credit: 2,700,472 RAC: 0 |
Well, that's a very obvious guess...Well, never a dull moment... I believe that rah_validator_beta is for beta work units, not Rosetta@home production units. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
Well, after two days being down, it took someone apparently less than two hours to fix the problem. All servers show status running and all but 3 WUs that where stuck as pending have been validated. Still wonder why the response from the R@H team has to be so abysmal compared to other scientific projects... :-( Ralf |
mt4cancer Send message Joined: 2 Mar 11 Posts: 1 Credit: 1,321,587 RAC: 0 |
Either the validator is malfunctioning and this is not registering on the Server Status page, or else it is badly behind. I have over 120 work units waiting to validate -- some are several days old. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
Either the validator is malfunctioning and this is not registering on the Server Status page, or else it is badly behind. I have over 120 work units waiting to validate -- some are several days old.Yeah, something's still up, all WU's that were pending over the weekend went through but now since this morning/last night, WU's keep getting stuck as pending again here as well... Ralf |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,158,554 RAC: 15,699 |
Either the validator is malfunctioning and this is not registering on the Server Status page, or else it is badly behind. I have over 120 work units waiting to validate -- some are several days old.Yeah, something's still up, all WU's that were pending over the weekend went through but now since this morning/last night, WU's keep getting stuck as pending again here as well... I agree. My machines are fine with new WUs getting uploaded & validated quickly, but a couple of my team-mates haven't fared so well - many unvalidated WUs over a few days: Comp 1327856 Comp 1327862 Comp 1370235 |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
Pending WUs are definitely piling up again, RALPH@Home WU's can not be reported due to a server error as well and on the server status page, everything shows running, which it certainly is not... :-( Ralf |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Everything normal here as of now, Uploads, downloads, validation, no pending`s, Though from what you are saying here something else is not. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
Everything normal here as of now,There's certainly something not right, not only with Rosetta@Home but with RALPH@Home as well. On R@H, WU's are uploaded and reported but then just sit as "pending". This was working at some point yesterday. And on RALPH@Home, you can not upload any finished WUs sue t a "can not attach to shared memory" error on the server(s). Don't know how much resources Rosetta@Home and RALPH@Home are sharing, but it looks to me as if whatever they fixed yesterday isn't in fact working properly... Ralf |
Message boards :
Number crunching :
Validator down... :-(
©2024 University of Washington
https://www.bakerlab.org