Problems on old AMD processors (pre-Bulldozer)

Message boards : Number crunching : Problems on old AMD processors (pre-Bulldozer)

To post messages, you must log in.

AuthorMessage
spRocket
Avatar

Send message
Joined: 23 Mar 20
Posts: 22
Credit: 3,008,018
RAC: 0
Message 92363 - Posted: 26 Mar 2020, 20:49:43 UTC

I'm finding that I get signal 11 issues with a couple of older AMD processors, an Athlon II X4 630 and a Phenom II X2 550 Black Edition (the latter running with two unlocked cores). Both of these systems are running on ASUS M4A785-M motherboards with 4 GB of ECC RAM.

It seems that Rosetta Mini works OK, but the full Rosetta consistently gets errors on tasks.

An example from Task 1133622372:

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol jhr_boinc.xml @flags -in:file:silent 7hp5zr7e_jhr_design1_COVID-19.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 7hp5zr7e_jhr_design1_COVID-19.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3696211
Starting watchdog...
Watchdog active.

</stderr_txt>
]]>


Both of these CPUs are shown as "Family 16" in the CPU type listing.

In the meantime, I've shifted both of these systems over to World Community Grid, which is working as it should. On the other hand, my Ryzen 7/1700 is happily devouring Rosetta tasks, as is an old ThinkPad with an i7 L 640.[/code]
ID: 92363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 92396 - Posted: 27 Mar 2020, 16:30:31 UTC

It looks like that task ran on your system with 4 CPU cores and 4GB of memory.

It seems the COVID tasks are consuming more memory than has been typical for other work. I believe you will find that running both WCG and R@h with same resource share will leave you with enough memory to still run some R@h work if you wish. This tends to run half the cores on WCG and half on R@h, and so runs a small memory WCG task alongside a large memory R@h task.
Rosetta Moderator: Mod.Sense
ID: 92396 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
spRocket
Avatar

Send message
Joined: 23 Mar 20
Posts: 22
Credit: 3,008,018
RAC: 0
Message 92399 - Posted: 27 Mar 2020, 16:40:06 UTC - in response to Message 92396.  

I think I'll just give another one of my other older machines a cleaning. I tried it earlier and I started hearing a thermal warning tone from its speaker - but on the other hand, it has 8 GB of RAM installed.
ID: 92399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
William Albert

Send message
Joined: 22 Mar 20
Posts: 23
Credit: 1,069,070
RAC: 147
Message 92649 - Posted: 30 Mar 2020, 19:57:36 UTC

I'm also having some odd issues with a pre-Bulldozer AMD machine.

The problem machine is running an AMD Turion II Neo N40L (essentially a low-power K10 chip). At the time of this writing, out of the 120 WUs that have been issued to it, 93 have failed with "process got signal 11". On a positive note, the WUs that fail do so quickly, so the machine isn't wasting too much time processing WUs that end up getting thrown away. However, it also points away from overheating (or some other type of environmental factor), and toward some type of compatibility issue.

I also have another machine running an AMD Phenom II X6 1090T, which is also a K10 chip, and it hasn't had failed WUs yet.

The one thing both of our failing machines have in common is that they're running Ubuntu 18.04, whereas my working AMD machine is running Windows 10. Perhaps there's some type of bug or compatibility issue with Rosetta's Linux workers?
ID: 92649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 92657 - Posted: 30 Mar 2020, 21:09:04 UTC
Last modified: 30 Mar 2020, 21:10:23 UTC

I think the AMD running Linux issue is now understood, see this post for details on AMD getting signal 11 failures. A fix will be tested on Ralph soon.
Rosetta Moderator: Mod.Sense
ID: 92657 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 87
Credit: 15,097,194
RAC: 46,563
Message 94231 - Posted: 12 Apr 2020, 11:28:10 UTC - in response to Message 92649.  


The one thing both of our failing machines have in common is that they're running Ubuntu 18.04, whereas my working AMD machine is running Windows 10. Perhaps there's some type of bug or compatibility issue with Rosetta's Linux workers?


I have run under Ubuntu 18.4 with no problem for Rosetta.

I haven't fired up an A6 laptop that I have so I can't tell if it is a problem on that old of hardware either.

Tom
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 94231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Problems on old AMD processors (pre-Bulldozer)



©2024 University of Washington
https://www.bakerlab.org