Message boards : Number crunching : all wu's error on 1 system, but OK on another
Author | Message |
---|---|
JStateson Send message Joined: 7 May 07 Posts: 15 Credit: 4,061,331 RAC: 0 |
This system had nothing but errors. Opteron290 (fastest) with 2gb of memory. This one is slower opteron275 with 4gb memory. Almost all WU's are good. Looking at stderr_txt I dont see anything exceptional except all errors on the k8ndre-1 I dont see what is causing the problem, maybe someone else can. I could try add more memory. Both run same version of windows 7 but the failing mombo is asus with gtx650ti the other tyan with pair of gts250. All of the gts540ti are completing their primegrid tasks with valid results and the gts250 generate valid results too, so what gives? |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,135,082 RAC: 4,703 |
This system had nothing but errors. Opteron290 (fastest) with 2gb of memory. Try rolling back the Nvidia drivers to 306.97 or earlier, the newer drivers seem to cause at least some of the problems. |
JStateson Send message Joined: 7 May 07 Posts: 15 Credit: 4,061,331 RAC: 0 |
The Rosetta work units are all CPU tasks. I am using the GPU's for the PrimeGrid challenge. I don't see how processing PrimeGrid GPU tasks can cause all the Rosetta CPU tasks to fail. However, after 30 years or programming, I know that one can only be 99.999999... certain that software will behave as designed. ie One cannot rule out side effects so there is a (slim) chance you might be correct. K8NDRE is using 306.97 and except for the tasks I aborted, it seems to have validated tasks. S2877 is using 310.70 and all are failing ...hmm... After the PrimeGrid challenge completes, I will roll back the driver. I am in 13th place in the challenge and it would be unlucky to roll it back now. |
JStateson Send message Joined: 7 May 07 Posts: 15 Credit: 4,061,331 RAC: 0 |
I have these reversed as I was not paying attention to which web page was on the screen. Rosetta does not show GPU info so I pulled up another project that shows the GPU version but got the two systems reversed. I have a (small) boinc farm and it is easy to get systems mixed up when they are not in front of me. The K8NDRE, which fails all tasks, is running 306.97 and the one that seeming is working just fine, s2877, has a later version, 310.70, so a rollback would not solve the problem here. If anything I need to advance 306.97 to latest drivers. It would be nice if the "show_host_detail" web page here showed the GPU and version even if the project does not use a gpu. Likewise, it would be nice to select "valid", "invalid", or "error" task results,etc to get a quick count of problems. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,135,082 RAC: 4,703 |
The Rosetta work units are all CPU tasks. I am using the GPU's for the PrimeGrid challenge. I don't see how processing PrimeGrid GPU tasks can cause all the Rosetta CPU tasks to fail. However, after 30 years or programming, I know that one can only be 99.999999... certain that software will behave as designed. ie One cannot rule out side effects so there is a (slim) chance you might be correct. No one knows but it does! That is just part of the frustrating nature of these problems, the Rosetta Admins PROMISED to help but have not, the ONLY info we have is from users that have found answers thru MANY trail and error sessions. ALL Rosetta has said is that 'it works just fine on the beta site'!! |
Message boards :
Number crunching :
all wu's error on 1 system, but OK on another
©2024 University of Washington
https://www.bakerlab.org