Message boards : Number crunching : Memory and CPU problems with Ubuntu 16.04?
Author | Message |
---|---|
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Just upgraded and it appears that BOINC has serious problems running in that environment. Appears to be 50% of it's pre-upgrade performance. Starts with 3 threads running a a 4-thread machine, but after a few minutes one of those threads will be suspended in a waiting for memory status. System monitor says half of the memory is available, so... My initial hypothesis is teething problems in 16.04, but it might be more bugginess of Rosetta, too. (On my Mac, there's a work unit that has run for 24 days so far (well past deadline) with 10 days estimated remaining, and I'm betting no credit will be recognized, too.) So anyone else have similar observations to share? #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
Kind of off topic, but I think restarting your MAC would force that work unit to call it a day and report in any models thus-far completed. **38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
How long a task has run is not the only thing to look at, in fact it is a minor consideration. The thing to observe is the CPU time the task has consumed (select the task and click the properties button). I'd have to guess that with a task running 24 days, that it is not getting any CPU. Otherwise the watchdog should have already shut it down. On your Ubuntu upgrade, is it possible you now actually have a bank of memory that is not physically reporting in? To say a machine is 50% of performance prior to the upgrade is unclear. But it sounds like, in the end, you are saying that only half of the cores are active when compared to what used to run. Yes, if the machine is approaching the bounds imposed by your runtime preferences, the BOINC Manager is going to throttle back on number of active tasks. Were all of your BOINC preferences retained through the upgrade? Or did some things perhaps get reset or updated from a different project? Rosetta Moderator: Mod.Sense |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
How long a task has run is not the only thing to look at, in fact it is a minor consideration. The thing to observe is the CPU time the task has consumed (select the task and click the properties button). I'd have to guess that with a task running 24 days, that it is not getting any CPU. Otherwise the watchdog should have already shut it down. Near as I can tell, all of the BOINC preferences were carried across the upgrade. I don't think the memory is failing to report in, because Ubuntu seems to know the memory is there and reports that 50% is available. It's a stock Toshiba with the original memory, but only 4 GB RAM, making it one of my smallest machines these days. Minor detail I noticed after booting to Linux this morning is that only two work units were active, but later on a third work unit was in the waiting for memory status. I had been in Windows 10, and things look quite normal there, with 4 work units actively running. From Ubuntu 16.04, two are running, one is waiting for memory (even though half the memory is apparently available), and the others are just ready to start. No one else reporting anything similar here? Only a couple of possibly related reports on the Ubuntu websites, but I didn't find any other references to BOINC as a possible diagnostic... Another negative data point is no Ubuntu patches or upgrades this morning. Not sure how to get more useful diagnostic data. I usually just rely on the basic System Monitor to see what is going on. It says that two of the four processors are running at 100%, but I think it is actually a dual core machine with two threads each. Hmm... Just spotted another peculiarity that may be indicative of something. Seems to be some confusion in the windowing system. I can only access the Menu bar for Firefox, not the other tasks/programs that are running... Slight evidence for a Linux-level bug in the window manager? On the Mac problem, it should be forked, but I did look at the ancient post on scoring and models. My reaction is also ancient: I don't care about complicated wrinkles and I think my donated computing time should not be rendered valueless because of complicated bugs on the OTHER side of things. I'm doing my part as well as I can. The scoring stuff doesn't even mention the deadlines, which is the part that seems most confusing, unfair, and unneeded. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Just got some more data and can't even imagine what it indicates... Since the ready-to-start units were already old and the deadlines make it better to start the freshest units possible, I went ahead and aborted those three. It pulled down some fresh work units. When I suspended the two running work units, I wound up with four running work units, even after resuming the two old ones... I tweaked it a bit more trying to find out if one of the old units was causing a problem, and failed... Right now it seems to be working normally, with four units running, two old ones and two of the new ones. WTF? As near as I can tell, I didn't do anything that should make any difference in such a way... Memory is still around 50% available, but now 100% of all four CPUs are in use. The Menu bar is still weird, apparently owned in some sense by Firefox without regard to who has the focus... I still suspect some kind of problem in Ubuntu 16.04, but I certainly can't say I have the bug in a can yet. I'll continue watching and trying to can it. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Sorry about the stutter, but that seems to be some kind of bug in the Message boards. It may have zapped the OP, too. Anyway, latest bizarre data is that one of the work units went back to the waiting for memory status again, while somehow throwing the fourth work unit over to the waiting to run status, so I was back in the condition of 2 running units... So I suspended the running psh unit, and it went back to 4 running units, even after resuming it... I've already abbreviated my feelings rudely on all of this... New theory is that the bug is somehow related to having two psh units running at the same time. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
In your computing preferences, what hat have you set for "Use at most XX% of memory when computer is in use" and "Use at most XX% of memory when computer is not in use"? . |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
Sorry about the stutter, but that seems to be some kind of bug in the Message boards. It may have zapped the OP, too. Can you post the relevant lines of the EVENT LOG. I have no problems with Ubuntu 16.04. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Sorry about the stutter, but that seems to be some kind of bug in the Message boards. It may have zapped the OP, too. Right now I apparently can't access the EVENT LOG or the computing preferences... Apparently the Menu bar cannot be accessed by most applications and Firefox is the odd man out because it can still access the Menu bar. That is pretty obviously an Ubuntu-level problem of some sort, but there is still no upgrade since I updated the machine... It seems that it can't be a problem with the computing preferences since it definitely can run 4 work units at the same time. Sometimes it just prefers not to. However within the BOINC Manager, I can still suspend work units and do other minor things, and it is pretty clear that the waiting-for-memory problem is linked to an attempt to run two psh work units at the same time. Once two psh work units start running, it more or less quickly transitions to a state where one work unit is waiting for memory, two work units are running, and the others are either waiting to run or ready to start. To get from that state to a normal 4-running state, it is necessary to at minimum suspend both of the psh work units. Since there are no other reports (here) of problems with Ubuntu 16.04 (so far), I'm thinking the BOINC part may be a subtle problem... The current psh units are new, but I'm going to try to prevent other psh units from starting on the weird theory that my upgrade may have somehow created a weird state that is being passed among psh work units... Maybe clearing all of them will clear it. On the Menu bar thing, I see that there were some pre-release reports of similar-sounding bugs on launchpad, so the best approach may be to open a fresh bug report on it with the suggestion that it might be related to the BOINC bug which might be related to the memory leak bug that was supposed to be related to the video drivers... I strongly suspect the waiting-for-memory status reported in the BOINC Manager is some kind of red herring, and that the System Monitor is reporting correctly that half the memory is available. (That's because changing the status to 4 running tasks increases memory usage by only a small amount.) #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
After visiting the Ubuntu support website, I think I better confirm that by EVENT LOG you mean the log that I used to access from the Menu bar at the top of the screen. There used to be several options displayed there, but now I see only the name of the program that has the focus. The sole exception (I've founnd) is Firefox, which still offers the menu options when it has the focus. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
I meant the BOINC MANAGER "TOOLS" submenu "EVENT LOG". I built a Ubuntu 16.04 VM and played around a little. It appears to me that IF YOU HAVE the BOINC EVENT LOG open and it is the FOCUS (meaning you have selected the EVENT LOG with your mouse) the MENU at the top will not become active with the MOUSE OVER like you expect it. When I SELECTED the BOINC MANAGER window, the menus at the top were available. When I SELECTED other BOINC windows like the EVENT LOG, the top menus would not respond to the mouse over. That seems different that earlier BOINC Ubuntu implementations. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
In the current state of this machine (an oldish Toshiba notebook), I cannot get to that Tools menu because no options appear on the Menu bar. So far the only exception appears to be Firefox, where the menus starting from File will appear. The other programs only change the program name on the Menu bar, but no other Menu options will appear. Did find some strong evidence in favor of the memory leak hypothesis... Noticed that it was back to the 2-running state, so I paused them and watched the System Monitor. Memory in use dropped to almost 30 percent, even though 4 units were now running, but the value of usage started creeping up until it passed 60%. Just did it again, dropped it to 30%, and now it's already past 55% and still creeping upward. Now that I have the bug in the can, it is time for apport-cli on boinc-manager... Maybe I have something the Ubuntu people can use? #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Once two psh work units start running, it more or less quickly transitions to a state where one work unit is waiting for memory, two work units are running, and the others are either waiting to run or ready to start. To get from that state to a normal 4-running state, it is necessary to at minimum suspend both of the psh work units. The psh WUs are pretty memory hungry, hence I asked you above to post what you allow BOINC to use. Increasing the values should solve the issue. . |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Once two psh work units start running, it more or less quickly transitions to a state where one work unit is waiting for memory, two work units are running, and the others are either waiting to run or ready to start. To get from that state to a normal 4-running state, it is necessary to at minimum suspend both of the psh work units. The psh unites are clearly not the problem, though it is likely they make the memory leak more quickly. I definitely think I have the bug in a can now, though I don't think it is a BOINC or Rosetta problem, but almost surely a major problem in Ubuntu 16.04. No problems on any of my other machines, including an even older Lenovo running Ubuntu 15.10. Starts with 4 units running and memory utilization from 25% to 30%, but from that state the memory usage gradually increases whether or not other things are going on. Somewhere above 60% BOINC Manager starts throwing units, though it might be stable with 2 running. Suspending the 4 running units and then resuming them takes the memory usage back to the 25% area... Reported the bug to the Ubuntu people from two angles, but no patches yet. It will probably help if other people start reporting it, but the bug is tricky enough that most people probably aren't even noticing it. May also be linked to certain kinds of hardware, though I think this is a quite generic machine, a Toshiba T350. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
... but there is still no upgrade since I updated the machine... Whoops. Should have been "... but there is still no update since I upgraded the machine..." (What's the time limit on editing messages?) #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I believe the time limit on editing your own messages is 1 hour. I want to just explain something here to try and help you avoid spending a lot of time chasing in the wrong direction. You seem very focused on how many tasks are running. But BOINC Manager has shown you that it is able to run as many as four tasks when it feels all of the other runtime preferences can be met. So that is not the issue. And, for what I gather, any time you see less than 4 tasks running, you see others with the waiting for memory status. This means the BOINC Manager has stopped processing on the task because too much (as determined by your runtime preferences) of your memory is being used by BOINC processes. So, I think you should be focused on the amount of memory associated with the active and suspended BOINC tasks. The other thing to note, is that it is normal for most any application that runs on your machine to start with zero memory, and ramp up the amount of memory used. This does not indicate a memory leak. It is just the nature of applications that do a lot of in-memory processing. They have to get so far through the processing before they are ready to allocate more memory and begin another phase of processing. For R@h tasks, they can often take 5 minutes or so to reach what will prove to be their ongoing working set of memory. So yes, you will see 4 tasks running as BOINC first starts. With all of the tasks starting at zero memory, there is plenty for all of them to run. But as the tasks progress further in to their processing, they start to collectively reach the memory preference you have set for the BOINC Manager. So, it picks an active tasks and suspends it. It also knows the amount of memory it will take to activate it again. Sometime the BOINC Manager will see some available memory and try to start another task, because one (or more) of those it has suspended previously are too large to fit in the space it has available. So, a new task may begin and (often time within the 5 minutes mentioned above) grow to consume more memory than your preference. This leaves the BOINC Manager too again find something to suspend. So focus on memory used by each task on the system, and specifically by the tasks that are actively getting CPU time. And focus on your BOINC preferences for how much memory you allow it to use. It pretty clearly sounds like your machine now has a setting in place that indicate you do not want BOINC Manager to use more than 50% of the system's memory. Rosetta Moderator: Mod.Sense |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
The psh unites are clearly not the problem That's right, your settings in combination with the 4GB of RAM for up to 4 WUs are the issue. There's no other obvious reason, why BOINC would suspend tasks into the waiting for memory state. So instead of writing long stories here and asking the Ubuntu people for some patches, I suggest once again to increase the values I posted above to something like 75% to start with, that's done in half minute. If that won't help, than we might want to spend more than this half minute on something else. . |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
I believe the time limit on editing your own messages is 1 hour. Thanks for that detail, but I'm afraid the rest of it was mostly CS 101, and my second degree was in computer science, so I cut it here. I haven't mentioned GC yet even though I am increasingly convinced that it is the most likely root of the problem. I do have access to the menus for all programs now, though that access returned before I found the new setting in the Appearance settings. Not sure if it is a new feature, but in 16,04 the menus default to a dynamic behavior that was obviously broken when most of the menus refused to appear. Alternatively, I now wonder if Firefox is invoking its menus in a nonstandard way, and perhaps that was causing that Menu bar problem. Anyway, I have set it for static behavior and the menus are stable, I can see my Computing preferences from the Options menu of BOINC Manager, and I can now confirm that they appear to be unchanged. The machine is unchanged hardware and BOINC Manager operates normally when it is running under Windows 10 on the same machine. Under Ubuntu 16.04, memory consumption gradually increases, usually over a period much longer than 5 minutes, and eventually the machine starts suspending them, though it seems stable at two, or sometimes three units. Right now it is not using the waiting for memory status. I don't know what that signifies. (Near as I can recall, I had never seen that status before upgrading to Ubuntu 16.04.) There have been several small Ubuntu updates, but I have been checking them to see if any of them might be related to the problems, and none of them had any obvious relationship to either of the symptoms I've noticed. I loosened up the memory restrictions in BOINC Manager, so it is running longer before shutting down work units. Maybe the real problem is that 16.04 is just using much more memory for something else? Another possibility is that some non-obvious or even hidden setting of BOINC Manager was changed in the upgrade to 16.04. The pattern I have now is to wait until it starts suspending work units. Then I suspend the running units, 4 other work units start running and memory drops back to the area around 30%--and then usage starts creeping up. Eventually it stops running 4 units and I can repeat the process. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
Sound like memory usage limit. EACH rosetta WU uses on average AT LEAST 0.5 GB of RAM (I have 3 right now using 600+ MB). This RAM usage increases as the WU progresses up to a certain maximum. It doesn't start using the maximum maount of RAM it'll eventually use right at the start... thus this slow increase in RAM usage. This means that having 4 normal WUs running at the same time will show up in your system monitor as 50% RAM usage JUST from Rosetta. You BOINC preferences are probably set up yo only allow 50-60% of the maximum RAM, thus BOINC suspends WUs until the RAM usage is below this threshold. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Sound like memory usage limit. EACH rosetta WU uses on average AT LEAST 0.5 GB of RAM (I have 3 right now using 600+ MB). This RAM usage increases as the WU progresses up to a certain maximum. It doesn't start using the maximum maount of RAM it'll eventually use right at the start... thus this slow increase in RAM usage. Unless I have invisible friends, it sounds like a reasonable or at least sufficiently plausible explanation. If so, 16.04 is moving in the direction of bloatware, but that is certainly no surprise these days. Further so, it is plausible that few people are running similarly old machines and fewer of them are noticing the performance changes, which could explain the paucity of reports from other observers. Then again, a lack of further comments from me may only indicate that I've given up and I'm running the machine under Windows 10. Much as I dislike Microsoft, I have to say at least this one isn't a flaming lemon. Yes, there are a couple of things I prefer doing from Linux, but nothing urgent right now. Oh yeah and by the way, the menu bar problem seems to be widely reported over on Launchpad and they have consolidated most of the reports (including mine) into one giant thread there. Not clear how much progress they are making, but after reading most of it, I'd estimate the probability that it is related to this BOINC problem at under 25%. My own guess is that it involves a new dynamic menu feature that doesn't work correctly, but under Appearance settings I switched it back to static menus and I'm not seeing it now. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Message boards :
Number crunching :
Memory and CPU problems with Ubuntu 16.04?
©2024 University of Washington
https://www.bakerlab.org