System Restarts Win 7 Intel i7

Message boards : Number crunching : System Restarts Win 7 Intel i7

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64124 - Posted: 22 Nov 2009, 10:58:34 UTC

All:

I have R@H running on several systems with no issues. I just installed BOINC on Win 7 running on a Core i7 and it is having lots of problems. The system restarts, most of my WUs go to 100% with computation error, nothing is working correctly here.

I did not see any posts so I assume everyone else is fine and I need to take a closer look at this system. I am not sure where to look. If you have similar issues, please post.

thx
Thx!

Paul

ID: 64124 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,135,730
RAC: 4,670
Message 64125 - Posted: 22 Nov 2009, 11:10:46 UTC - in response to Message 64124.  

All:

I have R@H running on several systems with no issues. I just installed BOINC on Win 7 running on a Core i7 and it is having lots of problems. The system restarts, most of my WUs go to 100% with computation error, nothing is working correctly here.

I did not see any posts so I assume everyone else is fine and I need to take a closer look at this system. I am not sure where to look. If you have similar issues, please post.
thx


I do not have these problems but will help start the troubleshooting process....how many projects are you running on that pc? It is best to try and stick with one until you get this solved if possible. If more than one project do you have the setting "Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes')" set to yes? It is under your account, computing preferences and then in the top section. Do you have the pc hyper-threaded meaning using all 8 cpu's or are you using just 4 cpu's?
ID: 64125 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64128 - Posted: 22 Nov 2009, 13:45:46 UTC - in response to Message 64125.  

I only crunch R@H. When I look at the failed WUs, they all have some failure to find file message in them.

I have leave application in memory when suspended. I have 9GB of RAM and Win 7 64-bit so I don't think I am using much swap space. I am thinking about setting my swap space to 0K.

HyperThreading is on so I get 8 WUs.

thx for the help.
Thx!

Paul

ID: 64128 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 64130 - Posted: 22 Nov 2009, 16:40:26 UTC
Last modified: 22 Nov 2009, 16:44:05 UTC

Failure to find a file would tend to point to either a network problem, where the BOINC client was unable to download the file; or to an authority problem where the file downloaded, but now the BOINC client is not authorized to access it.

Wow Paul, that's a lot of machines! Which host ID is having problems? ...oh here it is, the only Win7 machine:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1187725

I see some tasks had the file problem as you indicated:
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>minirosetta_database_rev33769.zip</file_name>
<error_code>-120</error_code>
<error_message>signature verification failed</error_message>

But others ran for over an hour before generating an exception. Which would imply that you now did get a good copy of the database. Which would seem like progress.

At this point, the machine has consumed it's full quota of work for the day, and won't be able to download more until a good result is returned, or a day passes.

Have you installed on Win7 elsewhere? There were some new "features" from M$ that were catching people there as I recall.
Rosetta Moderator: Mod.Sense
ID: 64130 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64133 - Posted: 22 Nov 2009, 20:10:50 UTC - in response to Message 64130.  

Thx for the response.

I finally had a successful WU. Maybe I just had a bad batch, maybe it is an AV thing, I don't know. I currently have BOINC limited to 60% of the CPU. It looks like I have 3 more WUs that will complete in the next few min. If all of them succeed, I will increase the max CPU percentage to 80%. Maybe I have a CPU or heat issue.

I will keep watching things. I gotta my Q6600 back on line today as well.

work work work


Thx!

Paul

ID: 64133 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 64134 - Posted: 22 Nov 2009, 21:41:20 UTC

No, shouldn't be a heat issue. If you think about it, none of the failures even ran long enough to make a heat issue :)

Yes, perhaps AV quarantined files (i.e. moved them from their expected location)
Rosetta Moderator: Mod.Sense
ID: 64134 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,135,730
RAC: 4,670
Message 64139 - Posted: 23 Nov 2009, 9:49:51 UTC - in response to Message 64133.  
Last modified: 23 Nov 2009, 9:51:19 UTC

Thx for the response.

I finally had a successful WU. Maybe I just had a bad batch, maybe it is an AV thing, I don't know. I currently have BOINC limited to 60% of the CPU. It looks like I have 3 more WUs that will complete in the next few min. If all of them succeed, I will increase the max CPU percentage to 80%. Maybe I have a CPU or heat issue.

I will keep watching things. I gotta my Q6600 back on line today as well.

work work work


Limiting the cpu to a percentage like you do has been a problem in some cases. It is better to reduce your cache than limit your cpu percentage. Open her up but limit your cache to 0.01, that way you should only get 1 unit, per cpu, at a time, hopefully you are not on a pay as you go cable plan.

On the thing about AV you should exclude the Boinc directories from your AV as sometimes they get over aggressive and cause problems. Periodically you will see threads on all the boards about AV such and such is causing crashes with Boinc, then the next month it is a different AV doing it.

On the heat issue do you have the 'cool and quiet' or whatever Intel calls it turned on in the bios? If so turn it off, it too can be over aggressive.
ID: 64139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 9,592
Message 64140 - Posted: 23 Nov 2009, 10:18:25 UTC

I'd say the first thing to check is prime95. There's a 64-bit version. It was failing for me on my most recent build because of a memory error that memtest86+ wasn't picking up. It ran fine when I clocked the memory down...
ID: 64140 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64141 - Posted: 23 Nov 2009, 11:24:38 UTC - in response to Message 64140.  

This is really weird. I had no issues all day yesterday but it looks like a 1:30 AM, I had a reboot??

Does Win 7 have a way to record the crash? I looked at the failed work units and it does not look like a CPU issue but it is odd that the computer completely restarts.

I have smartfan enabled because the thing is super loud if I don't. I added two case fans to keep this cool and they don't make much noise. It looks like the CPUs are staying cool.

I am going to cap the cpu at 60% for a few days and see if that fixes the problem. If so, I will increase that limit over time to see where the problems begin.

Troubleshooting - ugh
Thx!

Paul

ID: 64141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64143 - Posted: 23 Nov 2009, 12:05:06 UTC - in response to Message 64141.  

I just looked at the failed units and hope someone can point me in the right direction.

Any insight is greatly appreciated
Thx!

Paul

ID: 64143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 9,592
Message 64149 - Posted: 23 Nov 2009, 17:27:19 UTC
Last modified: 23 Nov 2009, 18:27:10 UTC

Rosetta might validate the results but it doesn't mean the computer isn't making mistakes. My Phenom II submitted a couple of Rosetta tasks successfully but would crash or restart occasionally, and wouldn't pass Prime95. Prime95 showed that the problem only went away when I dropped the DDR3 from 667MHz (auto) to 400 or 533MHz...

If it doesn't pass Prime95 then something is wrong and it is likely to be submitting incorrect results.

P.S. all versions (inc 64-bit) are here: http://www.mersenne.org/freesoft/
ID: 64149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64166 - Posted: 24 Nov 2009, 1:46:53 UTC - in response to Message 64149.  

no overclocking on this system at all

I will try prime95 and let you know.
Thx!

Paul

ID: 64166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64171 - Posted: 24 Nov 2009, 10:25:10 UTC - in response to Message 64167.  

No memtest on this computer yet. It is interesting that it can go for hours between reboots. 9GB of RAM provides lots of workspace.

BOINC now shares this computer with Folding @ home because the ATI HD4850 needs something to do. No reboot last night but I did have 1 failed WU on R&H with a computation error.

Is there a good memtest tool for Win 7 that can check 9GB - 12GB of RAM?

It would be good get to the root of the problem.

thx

Thx!

Paul

ID: 64171 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64172 - Posted: 24 Nov 2009, 10:44:08 UTC - in response to Message 64171.  

The last failed WUs had the same message:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x009946BF read attempt to address 0x98D67424

Engaging BOINC Windows Runtime Debugger...


Anyone have ideas as to what I can do to make this go away?

Thx!

Paul

ID: 64172 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 9,592
Message 64175 - Posted: 24 Nov 2009, 14:39:16 UTC - in response to Message 64171.  

No memtest on this computer yet. It is interesting that it can go for hours between reboots. 9GB of RAM provides lots of workspace.

BOINC now shares this computer with Folding @ home because the ATI HD4850 needs something to do. No reboot last night but I did have 1 failed WU on R&H with a computation error.

Is there a good memtest tool for Win 7 that can check 9GB - 12GB of RAM?

It would be good get to the root of the problem.

thx

There is - memtest86+ (note the +) will do 12GB:

http://www.memtest.org/download/4.00/memtest86+-4.00.iso.zip

but it didn't find my memory errors - but it said in memtest that teh memory was running at 475MHz (or something around there) and in Windows it was running at 667MHz and it was the speed causing the problems!

Prime95 is the most reliable method to test for stability. If that fails then you can change settings or remove memory until it passes to isolate.
ID: 64175 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64203 - Posted: 25 Nov 2009, 5:02:02 UTC - in response to Message 64175.  

Thanks for all of the suggestions.

I encountered a couple of STOP errors today and all of them had to do with page file corruption issues.

It looks like this is usually caused by a ill behaved driver. All of the drivers are now current so I will let things run for a few days and see what happens.

Windows 7 does not find the most recent drivers, just drivers that worked once in the past.

Thanks again for all the help and keep crunching!

Thx!

Paul

ID: 64203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64213 - Posted: 25 Nov 2009, 12:30:49 UTC - in response to Message 64206.  

All of my failed work units indicate

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005763AF write attempt to address 0x00000027

It looks like Minirosetta 1.98 had a similar issue with Win 7.

I will move my comments to the Minirosetta 2.00 bug thread.
Thx!

Paul

ID: 64213 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,274,905
RAC: 6,798
Message 64229 - Posted: 26 Nov 2009, 0:41:29 UTC - in response to Message 64213.  

Updated my video and storage drivers and no reboots for 24 hours.

Now the Core i7 is starting to show some progress. It would be great to get this system up to 2,500 - 3,000 credits a day.

It might not make it with Folding@Home running but I can't ignore the ATI HD 4850 and it needs something to do.

What kind of credit should I expect from this system?

thx

Thx!

Paul

ID: 64229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,135,730
RAC: 4,670
Message 64233 - Posted: 26 Nov 2009, 10:00:05 UTC - in response to Message 64229.  

Updated my video and storage drivers and no reboots for 24 hours.

Now the Core i7 is starting to show some progress. It would be great to get this system up to 2,500 - 3,000 credits a day.

It might not make it with Folding@Home running but I can't ignore the ATI HD 4850 and it needs something to do.

What kind of credit should I expect from this system?
thx


Well since you are running Boinc 6.10.18 you can attach to Collatz and get that gpu crunching on their units. Here is the website:
http://boinc.thesonntags.com/collatz/

If you go into the website setting for them you can say you only want gpu units and then you can still crunch here with the cpu. Collatz is a math problem project and works thru Boinc, so no extra software needed like Folding. As for credits you should be able to get in the 30,000+ RAC range with your gpu alone over there. You can use all 8 of your cpus here and your gpu there for a total of 9 units crunching all at once.
ID: 64233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,250,162
RAC: 0
Message 64358 - Posted: 3 Dec 2009, 14:46:11 UTC - in response to Message 64128.  

I only crunch R@H. When I look at the failed WUs, they all have some failure to find file message in them.

I have leave application in memory when suspended. I have 9GB of RAM and Win 7 64-bit so I don't think I am using much swap space. I am thinking about setting my swap space to 0K.

HyperThreading is on so I get 8 WUs.

thx for the help.


9 GB is a lot of ram... must be 3gb per stick. What is your memory's speed? If you're overclocking your ram, or the ram has errors, it can cause a computer to spontaneously reboot (ECC exceptions).

I recommend:
Download a copy of memtest86++ and burn the ISO to a CD. Let us know if it finds any errors (probably 2 hours to run).
http://www.memtest86.com/download.html
ID: 64358 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : System Restarts Win 7 Intel i7



©2024 University of Washington
https://www.bakerlab.org