boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over

Questions and Answers : Unix/Linux : boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2207
Credit: 42,134,903
RAC: 21,285
Message 97298 - Posted: 9 Jun 2020, 5:30:48 UTC - in response to Message 97263.  

My $85 Ryzen 5 1600 AF can handle higher RAM speeds even though AMD rather conservatively says that its rated speed is only 2667MHz. From what I've read the motherboard is more of a constraint than the CPU, at least up to about 3200MHz. My motherboard's QVL list mentions many kits that have been tested to run substantially faster than 2667MHz. https://www.asrock.com/mb/AMD/B450M%20Pro4/index.us.asp#Memory Of the 290 RAM kits that ASRock tested, 61 of them were rated at 3000 or better and none of them tested as running slower than 2933MHz, and all of the 29 3200MHz kits apparently ran at their rated speeds, and the 7 tested '2933MHz' sets also tested running at their rated speeds. I do so love playing with spreadsheets!

I reclocked my memory down to 2800MHz and have not gotten any additional errors over the last few days, running maybe 6-8 hours a day, so I guess that is where I'll stay. I am a bit disappointed that my memory does so much worse than all the other relatively fast sticks tested by ASRock, but probably it won't hurt my folding unduly.

I understand but istm the board can take a whole range of CPUs which will support faster speeds than your CPU can, but seeing as yours is at the lower end, it's the CPU providing a bottleneck, so your RAM is successful at 2800Mhz with a CPU that's supposed to only handle 2667MHz - consider yourself ahead.

Now you've found a stable speed, to eke out the last dregs you might want to run CPUz and check what timings your RAM supports at other speeds. I've found it's possible to tweak them just a little, especially if the RAM isn't running at full speed. Eg My DDR3 RAM defaults at 8-8-8-24-36 2T but I've edged it down to 8-8-8-24-33 1T while I've increased my FSB to run the RAM 2.85% above it's top speed (1645.6 for 1600MHz RAM). It's marginal, but it's also stable - 22days up-time for my overclock (that even surprised me!)

I'm not about to overclock my CPU, with the stock AMD processor fan I'm hitting rather high temps (80C) even at the rated default clock speed of 3200MHz (max burst speed is apparently 3700MHz without overclocking, but I've never seen the processor go faster than 3500MHz). On hot days I even reduce the CPU limits in BOINC preferences to keep the machine from overheating. Given my unimpressive performance I apparently wasn't too lucky in the hardware lottery, but what the heck, I built the system for about $300 and tax, if you don't count the case and power supply I recycled from an old Athlon XP 1700+ build.

I'm only using 9GB of RAM now, and have never seen it go over 10GB on this (or any) machine, so I have 5.6GB in the bank, but if I ever see the RAM usage go over 12GB I'll order another stick, RAM prices seem to be falling at the moment after climbing for a few months.

I'm running a small overclock but with permanent maximum speed (4525GHz instead of 4300MHz on an old FX8370) at 50C - also for 22days - but if you have heat issues and a stock fan there's no point me suggesting anything.
And if you're limiting CPU usage with Boinc settings, that probably explains why you're well within your RAM limits, so that's one bonus.

Overall, even though you're not happy with how you're running, it doesn't sound like there's a lot more you can do while avoiding errors. And unfortunately, I've never noticed tweaking RAM improves task performance at all tbh. Sorry.
ID: 97298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 97505 - Posted: 22 Jun 2020, 22:06:27 UTC

Crud, I thought I had this working properly, but I'm still getting a few errors. The machine was not cranking through units nearly as fast as I would like, but my errors didn't seem to be going up, so I ignored it for a couple of weeks. But now since I think May 30th I've gone from 44 completed units to 77, but errors crept up from 136 to 143. So a ratio of about 4.7 complete units to each failed unit, but that is still too high, right?

So in my ~/.BOINC directory there is a big (724.5 KiB) text file called stderrgui.txt. It has 12417 lines and most all them contain the words fatal, failure, error, invalid, WARNING, CRITICAL or failed, but I have no idea is this file is what I want to look at or how to grep through it to find significant hints about why I am still getting failed tasks. Most of the error messages seem to be 'drawing failure for widget' or other things that seem to refer to the GUI, which is not normally running.

Here are a few snippets from the file.

(firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:74:53: Invalid number for color value
(firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:75:53: Invalid number for color value
(firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:76:56: Invalid number for color value
(boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'wxPizza': invalid matrix (not invertible)
(boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'GtkBox': invalid matrix (not invertible)
(boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'GtkWindow': invalid matrix (not invertible)
(boincmgr:9675): Gtk-WARNING **: 17:49:40.475: drawing failure for widget 'wxPizza': invalid matrix (not invertible)
Memory pressure relief: Total: res = 14671872/14622720/-49152, res+swap = 10211328/10211328/0
Memory pressure relief: Total: res = 14622720/14622720/0, res+swap = 10158080/10158080/0
Memory pressure relief: Total: res = 14622720/14622720/0, res+swap = 10166272/10166272/0
Memory pressure relief: Total: res = 14618624/14630912/12288, res+swap = 10166272/10166272/0
Memory pressure relief: Total: res = 14561280/14561280/0, res+swap = 10108928/10108928/0
Memory pressure relief: Total: res = 14561280/14565376/4096, res+swap = 10113024/10113024/0
(boincmgr:170652): Gtk-CRITICAL **: 19:59:42.269: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar
(boincmgr:170652): Gtk-CRITICAL **: 19:59:42.269: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar
Gdk-Message: 20:00:16.658: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Gdk-Message: 20:00:16.658: boincmgr: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
(boincmgr:460505): Gtk-CRITICAL **: 14:33:52.096: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrbar
Temps seem OK (~70C. now), memory usage is 5.35G/15.6G, and according to mpstat I'm only using about 35% of my CPU power even though my BOINC computing preferences Usage linits are "at most 100% of CPUs" and "at most 90% of CPU time"; maybe I should try and re-seat my CPU fan? This machine just really seems to be underperforming.
ID: 97505 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2207
Credit: 42,134,903
RAC: 21,285
Message 97520 - Posted: 23 Jun 2020, 8:08:33 UTC - in response to Message 97505.  

Crud, I thought I had this working properly, but I'm still getting a few errors. The machine was not cranking through units nearly as fast as I would like, but my errors didn't seem to be going up, so I ignored it for a couple of weeks. But now since I think May 30th I've gone from 44 completed units to 77, but errors crept up from 136 to 143. So a ratio of about 4.7 complete units to each failed unit, but that is still too high, right?

So in my ~/.BOINC directory there is a big (724.5 KiB) text file called stderrgui.txt. It has 12417 lines and most all them contain the words fatal, failure, error, invalid, WARNING, CRITICAL or failed, but I have no idea is this file is what I want to look at or how to grep through it to find significant hints about why I am still getting failed tasks. Most of the error messages seem to be 'drawing failure for widget' or other things that seem to refer to the GUI, which is not normally running.

Here are a few snippets from the file.
(firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:74:53: Invalid number for color value
(firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:75:53: Invalid number for color value
(firefox:8163): Gtk-WARNING **: 18:24:11.661: Theme parsing error: colors.css:76:56: Invalid number for color value
(boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'wxPizza': invalid matrix (not invertible)
(boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'GtkBox': invalid matrix (not invertible)
(boincmgr:9675): Gtk-WARNING **: 17:49:40.458: drawing failure for widget 'GtkWindow': invalid matrix (not invertible)
(boincmgr:9675): Gtk-WARNING **: 17:49:40.475: drawing failure for widget 'wxPizza': invalid matrix (not invertible)
Memory pressure relief: Total: res = 14671872/14622720/-49152, res+swap = 10211328/10211328/0
Memory pressure relief: Total: res = 14622720/14622720/0, res+swap = 10158080/10158080/0
Memory pressure relief: Total: res = 14622720/14622720/0, res+swap = 10166272/10166272/0
Memory pressure relief: Total: res = 14618624/14630912/12288, res+swap = 10166272/10166272/0
Memory pressure relief: Total: res = 14561280/14561280/0, res+swap = 10108928/10108928/0
Memory pressure relief: Total: res = 14561280/14565376/4096, res+swap = 10113024/10113024/0
(boincmgr:170652): Gtk-CRITICAL **: 19:59:42.269: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar
(boincmgr:170652): Gtk-CRITICAL **: 19:59:42.269: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar
Gdk-Message: 20:00:16.658: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Gdk-Message: 20:00:16.658: boincmgr: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
(boincmgr:460505): Gtk-CRITICAL **: 14:33:52.096: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrbar
Temps seem OK (~70C. now), memory usage is 5.35G/15.6G, and according to mpstat I'm only using about 35% of my CPU power even though my BOINC computing preferences Usage limits are "at most 100% of CPUs" and "at most 90% of CPU time"; maybe I should try and re-seat my CPU fan? This machine just really seems to be underperforming.

I'm not sure it's quite as bad as you think.
You say errors have increased by 7 and there are 7 boincmgr messages in your file snippets, all of which sound like errors within those tasks and not to do with your host machine (I'm not certain on that tbh)
Your memory usage looks good - plenty of margin - and your temps are 10C lower than you reported before.
But you've mentioned that you've set "at most 90% of CPU time", which i don't think you've mentioned before.
Unintuitively, Boinc interprets this as running all cores at 100% for 90% of the time and 0% for 10% of the time and has been known to cause task errors.
While you've gained some extra margin in your temps, bump this up to 100% and see how it goes.
Temps are bound to go up, but with the benefit of an 11% improvement in CPU utilisation and hopefully some extra stability - maybe those weird errors will disappear.
If temps become a problem, better to reduce "at most 100% of CPUs" to 92% (11 of 12 threads) to retain stability than reduce CPU time to anything below 100%
Aside from that, I have no idea why your CPU utilisation is reporting so low.
Bear in mind that Rosetta is struggling to provide sufficient tasks to download at the moment - you've completed all your tasks right now
ID: 97520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 97756 - Posted: 28 Jun 2020, 1:30:53 UTC - in response to Message 97520.  

The snippet I quoted was just a few lines to give the flavor, the file had 1240+ lines, each one (that I noticed) with some sort of error message.

As Sid seemed to suggest, I bumped my computing preferences up to 100% of CPUs 100% of the time and I've been running the machine a few more days without doing anything else and my failed tasks are still 143 while my completed tasks has gone from 77 to 100, and temps are holding steady at 78-80C, which is higher than a lot of folks with my processor report, but still acceptable apparently. So unless I get a big increase in failed tasks I'm just going to assume the machine is behaving reasonably well and making a contribution and I just got a mucked up work unit download.
ID: 97756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 97758 - Posted: 28 Jun 2020, 2:27:44 UTC
Last modified: 28 Jun 2020, 2:29:13 UTC

I would reduce the size of your cache, as it's taking you around 3 days to return work, and the deadlines are 3 days, but with the amount of work you are carrying you are occasionally missing deadlines.
In your account settings,

   Other	
                                Store at least 0.2  days of work
                     Store up to an additional 0.02 days of work
That should give you enough work to keep the system busy, and not miss deadlines.
Grant
Darwin NT
ID: 97758 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2207
Credit: 42,134,903
RAC: 21,285
Message 97759 - Posted: 28 Jun 2020, 3:09:45 UTC - in response to Message 97756.  

As Sid seemed to suggest, I bumped my computing preferences up to 100% of CPUs 100% of the time and I've been running the machine a few more days without doing anything else and my failed tasks are still 143 while my completed tasks has gone from 77 to 100, and temps are holding steady at 78-80C, which is higher than a lot of folks with my processor report, but still acceptable apparently. So unless I get a big increase in failed tasks I'm just going to assume the machine is behaving reasonably well and making a contribution and I just got a mucked up work unit download.

Good news. And a little bit of googling lets me bring some more.
As I'm still on an AMD FX8370 i don't know too much about any of the Ryzens, so when you said you have a Ryzen 5 1600AF I didn't recognise the significance of the AF bit.
I've discovered what it is from a table here and the good news is that while the original 1600 could only access 2667 RAM, the AF can access 2933.
It might be we got sidetracked on RAM speeds, while looking up the wrong processor, until we realised you were using that "90% of the time" setting, which might've been the real cause of your issues.
So I'm going to suggest you bump your RAM speed back up to 2933 and see how that goes.
Fingers crossed.
ID: 97759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 97801 - Posted: 28 Jun 2020, 22:14:01 UTC - in response to Message 97758.  

Grant, my Computing preferences settings are as follows:
Store at least 0.1 days of work
Store up to an additional 0.5 days of work
Switch between tasks every 60 minutes

Can you suggest better values? My machine is on typically 5-8 hours a day, but often missing a day or more. I had no idea I only had 3 days to return a work unit. That seems a little tight for folks that do not leave their computers on all the time or have very fast machines.
ID: 97801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 97802 - Posted: 28 Jun 2020, 22:23:06 UTC - in response to Message 97759.  

Sid, I think I will ramp up my RAM speed back to 2933MHz. I passed memtest at that speed, but dropped my memory speed down to 2800MHz when I noticed that I was getting failed units. But now that I know that my failed units might have been caused by taking too long to complete my tasks dropping the speed was probably a misteak.

BTW, Mr. Celery, I'm amazed how much work and how low your temps are with an AMD FX8370. My CPU is supposedly about twice as fast as your 6 year-old processor and uses half the amount of juice, but you have much lower temps and seem to be getting more crunching done. https://www.cpubenchmark.net/compare/AMD-Ryzen-5-1600-vs-AMD-FX-8370-Eight-Core/2984vs2347
ID: 97802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2207
Credit: 42,134,903
RAC: 21,285
Message 97816 - Posted: 29 Jun 2020, 14:04:45 UTC - in response to Message 97802.  
Last modified: 29 Jun 2020, 14:13:02 UTC

Grant, my Computing preferences settings are as follows:
Store at least 0.1 days of work
Store up to an additional 0.5 days of work
Switch between tasks every 60 minutes

Can you suggest better values? My machine is on typically 5-8 hours a day, but often missing a day or more. I had no idea I only had 3 days to return a work unit. That seems a little tight for folks that do not leave their computers on all the time or have very fast machines.

Boinc ought to account for your reduced up-time, but if you still think it's providing more tasks than you can complete within the relatively short deadlines there's nothing stopping you from manually reducing the additional days figure.
There's no real rule here, apart from using a figure that allows you to be successful, while also calling down extra tasks before you run out, given you have slightly limited internet access.
Edge your additional days down if you think you have too many tasks to complete by deadline, but if you find you have unused cores before grabbing more, tweak it back up again.

Sid, I think I will ramp up my RAM speed back to 2933MHz. I passed memtest at that speed, but dropped my memory speed down to 2800MHz when I noticed that I was getting failed units. But now that I know that my failed units might have been caused by taking too long to complete my tasks dropping the speed was probably a mistake.

BTW, Mr. Celery, I'm amazed how much work and how low your temps are with an AMD FX8370. My CPU is supposedly about twice as fast as your 6 year-old processor and uses half the amount of juice, but you have much lower temps and seem to be getting more crunching done https://www.cpubenchmark.net/compare/AMD-Ryzen-5-1600-vs-AMD-FX-8370-Eight-Core/2984vs2347

Ha! The main reason for this is I'm in the Midlands of England and you're in Cuba. Ambient temperatures here give me a distinct advantage (the only one?) of letting me overclock and adjust my power settings so I'm running permanently at 4.525Ghz and 24/7 while you're nearer 3.2GHz and 1/3 of not quite every day. I was in trouble last week when we had a mini heatwave with temps in the 90s, but now we're struggling to reach 70F I'm ok again.
My CPU's max working temp is 62C though, not 95C like yours, so I don't have great margins and I also use a 280mm water-cooled CPU cooler to keep the temps right down. Once we get past what we laughingly call our summer, I'll be looking to increase my overclock back to 4.73GHz which I was running before I blew my last motherboard in March - running so fast comes at some cost.

But I note your most recent 8hr tasks are crediting you over 410, while mine average 250-270. That seems to better reflect the power of your CPU over mine now all your settings seem optimised.
ID: 97816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Questions and Answers : Unix/Linux : boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over



©2025 University of Washington
https://www.bakerlab.org