Validation errors prior to fileserver crash

Message boards : Number crunching : Validation errors prior to fileserver crash

To post messages, you must log in.

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2117
Credit: 41,158,554
RAC: 15,699
Message 69026 - Posted: 8 Jan 2011, 1:21:29 UTC
Last modified: 8 Jan 2011, 1:24:06 UTC

A new team-mate has been running very successfully since joining, but on the 4th5th had 14 consecutive validation errors. No idea why but it doesn't look like it's a problem at his end.

Was it the first hint of issues on the fileserver? Anyone else see this with their uploads? I didn't and neither did other team-mates.

Can these WUs be re-checked?

Edit: Oops! User is itnumberpi
ID: 69026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 69027 - Posted: 8 Jan 2011, 1:25:28 UTC - in response to Message 69026.  

Hey Sid - how much run time did he have on these failed tasks? I had a few tasks fail shortly before the outage and I think that they ended up with validation errors - but since they only ran for 10 15 seconds each it was clear that they were sour tasks from the start

CH
ID: 69027 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2117
Credit: 41,158,554
RAC: 15,699
Message 69028 - Posted: 8 Jan 2011, 2:31:59 UTC

Looks to be the full 3 hours in the main - not a WU problem but a validation issue it seems to me.

I'm not in contact with the guy to know more - he's a friend of a friend.
ID: 69028 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 69029 - Posted: 8 Jan 2011, 3:37:42 UTC - in response to Message 69028.  

If that's the case then clearly it's not the same issue. Have a great night!
ID: 69029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2117
Credit: 41,158,554
RAC: 15,699
Message 69030 - Posted: 8 Jan 2011, 4:29:20 UTC
Last modified: 8 Jan 2011, 4:30:14 UTC

Not sure why but I just took a look at your machines, seeing as you have such a high RAC. Take a look at your last results on these 3 machines. Same thing.

https://boinc.bakerlab.org/rosetta/results.php?hostid=1312275

https://boinc.bakerlab.org/rosetta/results.php?hostid=1346087

https://boinc.bakerlab.org/rosetta/results.php?hostid=1277775

No idea why it happens on those and not on your other ones...

Also, see KEL's message on the Rosetta front page. That's one guy who's not going to have a good night...
ID: 69030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bernd Schnitker

Send message
Joined: 2 Jan 09
Posts: 10
Credit: 62,009
RAC: 0
Message 69037 - Posted: 8 Jan 2011, 10:02:11 UTC

I have 2 that failed to validate from the 4th and 5th of Jan also. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=357710503
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=357647672
I hope they are fixed in the end.
ID: 69037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 69044 - Posted: 8 Jan 2011, 17:37:42 UTC

Some validation issues as the project restarted may be due to the restored database not being entirely in-synch as mentioned in the project news on the home page. This is probably why there is currently no validation being done, it would be doing more harm then good to the databases.

Some validation issues prior to the crash may have been precursors to the final failure that occurred.
Rosetta Moderator: Mod.Sense
ID: 69044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Validation errors prior to fileserver crash



©2024 University of Washington
https://www.bakerlab.org