Questions and Answers : Unix/Linux : boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over
Previous · 1 · 2
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2207 Credit: 42,134,903 RAC: 21,285 ![]() |
My $85 Ryzen 5 1600 AF can handle higher RAM speeds even though AMD rather conservatively says that its rated speed is only 2667MHz. From what I've read the motherboard is more of a constraint than the CPU, at least up to about 3200MHz. My motherboard's QVL list mentions many kits that have been tested to run substantially faster than 2667MHz. https://www.asrock.com/mb/AMD/B450M%20Pro4/index.us.asp#Memory Of the 290 RAM kits that ASRock tested, 61 of them were rated at 3000 or better and none of them tested as running slower than 2933MHz, and all of the 29 3200MHz kits apparently ran at their rated speeds, and the 7 tested '2933MHz' sets also tested running at their rated speeds. I do so love playing with spreadsheets! I understand but istm the board can take a whole range of CPUs which will support faster speeds than your CPU can, but seeing as yours is at the lower end, it's the CPU providing a bottleneck, so your RAM is successful at 2800Mhz with a CPU that's supposed to only handle 2667MHz - consider yourself ahead. Now you've found a stable speed, to eke out the last dregs you might want to run CPUz and check what timings your RAM supports at other speeds. I've found it's possible to tweak them just a little, especially if the RAM isn't running at full speed. Eg My DDR3 RAM defaults at 8-8-8-24-36 2T but I've edged it down to 8-8-8-24-33 1T while I've increased my FSB to run the RAM 2.85% above it's top speed (1645.6 for 1600MHz RAM). It's marginal, but it's also stable - 22days up-time for my overclock (that even surprised me!) I'm not about to overclock my CPU, with the stock AMD processor fan I'm hitting rather high temps (80C) even at the rated default clock speed of 3200MHz (max burst speed is apparently 3700MHz without overclocking, but I've never seen the processor go faster than 3500MHz). On hot days I even reduce the CPU limits in BOINC preferences to keep the machine from overheating. Given my unimpressive performance I apparently wasn't too lucky in the hardware lottery, but what the heck, I built the system for about $300 and tax, if you don't count the case and power supply I recycled from an old Athlon XP 1700+ build. I'm running a small overclock but with permanent maximum speed (4525GHz instead of 4300MHz on an old FX8370) at 50C - also for 22days - but if you have heat issues and a stock fan there's no point me suggesting anything. And if you're limiting CPU usage with Boinc settings, that probably explains why you're well within your RAM limits, so that's one bonus. Overall, even though you're not happy with how you're running, it doesn't sound like there's a lot more you can do while avoiding errors. And unfortunately, I've never noticed tweaking RAM improves task performance at all tbh. Sorry. ![]() ![]() |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Crud, I thought I had this working properly, but I'm still getting a few errors. The machine was not cranking through units nearly as fast as I would like, but my errors didn't seem to be going up, so I ignored it for a couple of weeks. But now since I think May 30th I've gone from 44 completed units to 77, but errors crept up from 136 to 143. So a ratio of about 4.7 complete units to each failed unit, but that is still too high, right? So in my ~/.BOINC directory there is a big (724.5 KiB) text file called stderrgui.txt. It has 12417 lines and most all them contain the words fatal, failure, error, invalid, WARNING, CRITICAL or failed, but I have no idea is this file is what I want to look at or how to grep through it to find significant hints about why I am still getting failed tasks. Most of the error messages seem to be 'drawing failure for widget' or other things that seem to refer to the GUI, which is not normally running. Here are a few snippets from the file. (firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:74:53: Invalid number for color value (firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:75:53: Invalid number for color value (firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:76:56: Invalid number for color value (boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'wxPizza': invalid matrix (not invertible) (boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'GtkBox': invalid matrix (not invertible) (boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'GtkWindow': invalid matrix (not invertible) (boincmgr:9675): Gtk-WARNING **: 17:49:40.475: drawing failure for widget 'wxPizza': invalid matrix (not invertible) Memory pressure relief: Total: res = 14671872/14622720/-49152, res+swap = 10211328/10211328/0 Memory pressure relief: Total: res = 14622720/14622720/0, res+swap = 10158080/10158080/0 Memory pressure relief: Total: res = 14622720/14622720/0, res+swap = 10166272/10166272/0 Memory pressure relief: Total: res = 14618624/14630912/12288, res+swap = 10166272/10166272/0 Memory pressure relief: Total: res = 14561280/14561280/0, res+swap = 10108928/10108928/0 Memory pressure relief: Total: res = 14561280/14565376/4096, res+swap = 10113024/10113024/0 (boincmgr:170652): Gtk-CRITICAL **: 19:59:42.269: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar (boincmgr:170652): Gtk-CRITICAL **: 19:59:42.269: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar Gdk-Message: 20:00:16.658: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Gdk-Message: 20:00:16.658: boincmgr: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. (boincmgr:460505): Gtk-CRITICAL **: 14:33:52.096: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrbarTemps seem OK (~70C. now), memory usage is 5.35G/15.6G, and according to mpstat I'm only using about 35% of my CPU power even though my BOINC computing preferences Usage linits are "at most 100% of CPUs" and "at most 90% of CPU time"; maybe I should try and re-seat my CPU fan? This machine just really seems to be underperforming. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2207 Credit: 42,134,903 RAC: 21,285 ![]() |
Crud, I thought I had this working properly, but I'm still getting a few errors. The machine was not cranking through units nearly as fast as I would like, but my errors didn't seem to be going up, so I ignored it for a couple of weeks. But now since I think May 30th I've gone from 44 completed units to 77, but errors crept up from 136 to 143. So a ratio of about 4.7 complete units to each failed unit, but that is still too high, right? I'm not sure it's quite as bad as you think. You say errors have increased by 7 and there are 7 boincmgr messages in your file snippets, all of which sound like errors within those tasks and not to do with your host machine (I'm not certain on that tbh) Your memory usage looks good - plenty of margin - and your temps are 10C lower than you reported before. But you've mentioned that you've set "at most 90% of CPU time", which i don't think you've mentioned before. Unintuitively, Boinc interprets this as running all cores at 100% for 90% of the time and 0% for 10% of the time and has been known to cause task errors. While you've gained some extra margin in your temps, bump this up to 100% and see how it goes. Temps are bound to go up, but with the benefit of an 11% improvement in CPU utilisation and hopefully some extra stability - maybe those weird errors will disappear. If temps become a problem, better to reduce "at most 100% of CPUs" to 92% (11 of 12 threads) to retain stability than reduce CPU time to anything below 100% Aside from that, I have no idea why your CPU utilisation is reporting so low. Bear in mind that Rosetta is struggling to provide sufficient tasks to download at the moment - you've completed all your tasks right now ![]() ![]() |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
The snippet I quoted was just a few lines to give the flavor, the file had 1240+ lines, each one (that I noticed) with some sort of error message. As Sid seemed to suggest, I bumped my computing preferences up to 100% of CPUs 100% of the time and I've been running the machine a few more days without doing anything else and my failed tasks are still 143 while my completed tasks has gone from 77 to 100, and temps are holding steady at 78-80C, which is higher than a lot of folks with my processor report, but still acceptable apparently. So unless I get a big increase in failed tasks I'm just going to assume the machine is behaving reasonably well and making a contribution and I just got a mucked up work unit download. |
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
I would reduce the size of your cache, as it's taking you around 3 days to return work, and the deadlines are 3 days, but with the amount of work you are carrying you are occasionally missing deadlines. In your account settings, Other Store at least 0.2 days of work Store up to an additional 0.02 days of workThat should give you enough work to keep the system busy, and not miss deadlines. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2207 Credit: 42,134,903 RAC: 21,285 ![]() |
As Sid seemed to suggest, I bumped my computing preferences up to 100% of CPUs 100% of the time and I've been running the machine a few more days without doing anything else and my failed tasks are still 143 while my completed tasks has gone from 77 to 100, and temps are holding steady at 78-80C, which is higher than a lot of folks with my processor report, but still acceptable apparently. So unless I get a big increase in failed tasks I'm just going to assume the machine is behaving reasonably well and making a contribution and I just got a mucked up work unit download. Good news. And a little bit of googling lets me bring some more. As I'm still on an AMD FX8370 i don't know too much about any of the Ryzens, so when you said you have a Ryzen 5 1600AF I didn't recognise the significance of the AF bit. I've discovered what it is from a table here and the good news is that while the original 1600 could only access 2667 RAM, the AF can access 2933. It might be we got sidetracked on RAM speeds, while looking up the wrong processor, until we realised you were using that "90% of the time" setting, which might've been the real cause of your issues. So I'm going to suggest you bump your RAM speed back up to 2933 and see how that goes. Fingers crossed. ![]() ![]() |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Grant, my Computing preferences settings are as follows: Store at least 0.1 days of work Store up to an additional 0.5 days of work Switch between tasks every 60 minutes Can you suggest better values? My machine is on typically 5-8 hours a day, but often missing a day or more. I had no idea I only had 3 days to return a work unit. That seems a little tight for folks that do not leave their computers on all the time or have very fast machines. |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Sid, I think I will ramp up my RAM speed back to 2933MHz. I passed memtest at that speed, but dropped my memory speed down to 2800MHz when I noticed that I was getting failed units. But now that I know that my failed units might have been caused by taking too long to complete my tasks dropping the speed was probably a misteak. BTW, Mr. Celery, I'm amazed how much work and how low your temps are with an AMD FX8370. My CPU is supposedly about twice as fast as your 6 year-old processor and uses half the amount of juice, but you have much lower temps and seem to be getting more crunching done. https://www.cpubenchmark.net/compare/AMD-Ryzen-5-1600-vs-AMD-FX-8370-Eight-Core/2984vs2347 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2207 Credit: 42,134,903 RAC: 21,285 ![]() |
Grant, my Computing preferences settings are as follows: Boinc ought to account for your reduced up-time, but if you still think it's providing more tasks than you can complete within the relatively short deadlines there's nothing stopping you from manually reducing the additional days figure. There's no real rule here, apart from using a figure that allows you to be successful, while also calling down extra tasks before you run out, given you have slightly limited internet access. Edge your additional days down if you think you have too many tasks to complete by deadline, but if you find you have unused cores before grabbing more, tweak it back up again. Sid, I think I will ramp up my RAM speed back to 2933MHz. I passed memtest at that speed, but dropped my memory speed down to 2800MHz when I noticed that I was getting failed units. But now that I know that my failed units might have been caused by taking too long to complete my tasks dropping the speed was probably a mistake. Ha! The main reason for this is I'm in the Midlands of England and you're in Cuba. Ambient temperatures here give me a distinct advantage (the only one?) of letting me overclock and adjust my power settings so I'm running permanently at 4.525Ghz and 24/7 while you're nearer 3.2GHz and 1/3 of not quite every day. I was in trouble last week when we had a mini heatwave with temps in the 90s, but now we're struggling to reach 70F I'm ok again. My CPU's max working temp is 62C though, not 95C like yours, so I don't have great margins and I also use a 280mm water-cooled CPU cooler to keep the temps right down. Once we get past what we laughingly call our summer, I'll be looking to increase my overclock back to 4.73GHz which I was running before I blew my last motherboard in March - running so fast comes at some cost. But I note your most recent 8hr tasks are crediting you over 410, while mine average 250-270. That seems to better reflect the power of your CPU over mine now all your settings seem optimised. ![]() ![]() |
Questions and Answers :
Unix/Linux :
boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over
©2025 University of Washington
https://www.bakerlab.org