Message boards : Number crunching : Ryzen improvment with Linux 4.15.0-29
Author | Message |
---|---|
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Having had so much bad luck with my Ryzen 1700 on Rosetta under Linux, I thought I would mention the improvement after upgrading to the latest Linux kernel (4.15.0-29). I now get much more consistent output, and a low error rate. https://boinc.bakerlab.org/rosetta/results.php?hostid=3432628&offset=0&show_names=0&state=4&appid= In fact, it is now better than my Intel i7-3770 and i7-4770 machines, which still show very inconsistent output unless I leave at least 3 cores (4 cores is better) free. However, the Rosetta is now running on 4 cores of my Ryzen, while 11 cores are on WCG (all projects) and another core supports a GPU on Folding, so I don't have to leave any free. %0 |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 8,387 |
Having had so much bad luck with my Ryzen 1700 on Rosetta under Linux, I thought I would mention the improvement after upgrading to the latest Linux kernel (4.15.0-29). I now get much more consistent output, and a low error rate. I'd be curious to try it with a new Threadripper!! |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have found that it still helps to limit it to running Rosetta on only 2 cores at a time for the most consistent output. But that is different than having to reserve cores. I can still use all my other cores on the various WCG projects without ill effects. It is quite nice. And you could probably run with even more cores for a small reduction in output. It depends on what you want. I am suspicious that it is similar to the MIP (Microbiome Immunity Project) on WCG, where you have to limit it to running only a few cores at a time for best output. That is based on Rosetta, so there are some similarities. PS - I will be building a Ryzen 2700 in a couple of months, and will try Rosetta on 8 full cores rather than 16 virtual cores. It might work. Threadripper would be great to try. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 7,218 |
I have found that it still helps to limit it to running Rosetta on only 2 cores at a time for the most consistent output. The name of the WCG MIP Linux binary on my machine is "wcgrid_mip1_rosetta_7.11_x86_64-pc-linux-gnu". Notice the "rosetta" in the name. 8-) Rosetta, like other projects, has stripped symbols from the binary. I disassembled the binary and used absolute addresses to see Rosetta was doing while running. I ran Rosetta 4.07 on all cores and it did not even show any computation ... work being done. It seemed to spend a huge chunk of its time spinning on the availability of a "LOCK". Maybe I should try the exercise again, but with increasing numbers of WUs to confirm my initial findings. The design of code that uses a "LOCK" is not that hard. The design of EFFICIENT code for performance is more tricky. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Rosetta, like other projects, has stripped symbols from the binary. I disassembled the binary and used absolute addresses to see Rosetta was doing while running. I ran Rosetta 4.07 on all cores and it did not even show any computation ... work being done. It seemed to spend a huge chunk of its time spinning on the availability of a "LOCK". I am not sure what a "LOCK" is for, but it does not sound promising. Maybe you can jog them into doing the right thing. Another issue is that BOINC has a bad habit of pausing one Rosetta work unit to run another. It could help to disable "Leave application in memory" as you suggested; I will try it in a couple of weeks. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I should also mention that to keep the Rosettas from being suspended at all, I have increased the BOINC "Switch between applications every" to 1600 minutes (thanks to anniet on the BOINC forum). It seems to be working fine thus far, and all the Rosettas that have once started are running now, with none paused. I think it will help the consistency too. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Bad luck. My Ryzen machine is now stuck with Universe "long runners" on all the cores. Nothing is working. Even my GTX 1070 on Folding has low output due to the core that supports it being taken over by the bad BHSpin v2 work units. I will not be able to fix it for a few days. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 7,218 |
Bad luck. My Ryzen machine is now stuck with Universe "long runners" on all the cores. Nothing is working. Even my GTX 1070 on Folding has low output due to the core that supports it being taken over by the bad BHSpin v2 work units. I will not be able to fix it for a few days. Why won't you be able to fix it for a few days? Seems like you can - SUSPEND a couple of Universe long runner TASKS for awhile to let other stuff run. Suspending a TASK will prevent more work coming from that PROJECT. - define the MAX_CONCURRENT TASKS in the app_config.xml file to limit the number of those tasks that can start. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Why won't you be able to fix it for a few days? I am 800 miles from the machine. It was a gamble. I lost. (The real solution is dumping Universe.) |
PappaLitto Send message Joined: 14 Nov 17 Posts: 17 Credit: 28,136,182 RAC: 1,168 |
You should install teamviewer on your crunching machines, makes life a lot easier. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 8,387 |
I'd be curious to try it with a new Threadripper!! Uh, Threadripper it's better with linux than with Win10 But also better than Windows Server!! Threadripper |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 8,387 |
I'd be curious to try it with a new Threadripper!! And also in some distro, this cpu is better than Xeon DragonFlyBSD The Threadripper 2990WX is a beast. It is at *least* 50% faster than both the quad socket opteron and the dual socket Xeon system I tested against. The primary limitation for the 2990WX is likely its 4 channels of DDR4 memory, and like all Zen and Zen+ CPUs, memory performance matters more than CPU frequency (and costs almost no power to pump up the performance). That said, it still blow away a dual-socket Xeon with 3x the number of memory channels. That is impressive!" |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
After returning home and dumping all the bad work units, the Ryzen 1700 is almost back to normal, though the BOINC scheduler is still running too many Rosettas at the moment. But the credit output is good, and returning to a consistent state. https://boinc.bakerlab.org/rosetta/results.php?hostid=3432628 However, I have gotten 3 errors on this machine (all of the PF type) out of 10 work units completed. That compares to no errors out of a total of 24 completed on my i7-3770 and i7-8700. So I think the Ryzen is still a little error-prone. When I last tried out the Ryzen a year ago, I found that turning off SMT in the BIOS and just running on full cores eliminated the errors. Maybe someone could try it (especially on the Threadrippers), but I will be taking this Ryzen off after completing this test. Good luck. |
Message boards :
Number crunching :
Ryzen improvment with Linux 4.15.0-29
©2024 University of Washington
https://www.bakerlab.org