Message boards : Number crunching : Huge RAM usage by some of latest WUs
Author | Message |
---|---|
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,841,158 RAC: 12,408 |
Hello. One of my computer crashed today. Then i start digging why - it was out of RAM. And second was in "swap of death" state"(swapping non-stop for hours while almost not doing any useful work ) More digging - reason of out of RAM and non-stop swapping was Rosetta. I see HUGE RAM usage by some of latest WUs. Form 1.5 to 3.5 GB of RAM per working WU. You can see a lot of task using 1400-1600 MB of RAM currently and ~2800 MB of RAM as a peak value. Before crash and reboot few tasks peaked at ~3200-3500 MB before system crash after running out of both RAM and disk swap space. Usual consumption for R@H in 300-1000 MB range. Is this WUs is something completely new? Or just bugs like memory leaks? It all Rosetta 4.07 WUs and names start by "rb_02_xx (where xx = 29, 08, 08 and 10). I guess it Robetta WUs generated at 29 JAN, 08 FEB, 09 FEB, 10 FEB. I was forced to limit maximum of concurrency running R@H units using "max concurrency" setting in app config. Some example WUs https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861215 https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861165 https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861118 https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861128 https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861130 https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861138 https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861090 https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861114 https://boinc.bakerlab.org/rosetta/result.php?resultid=1121613378 |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,841,158 RAC: 12,408 |
Longer they run - more RAM to consume. Now > 3000 MB per WU after ~5 hours of running. rb_02_08_15652_15556__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_891233_7217 and rb_02_08_15652_15556__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_891233_7469 Looks much like memory leaks. Buy it non linear but RAM usage jump after each stage of computation finished and new begins. Smell like data/object not released properly after use. |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 2 |
Yes, I see the same behavior for all rb_02_08_15652_15556__xx units |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I think the moderator says that happens on the development versions. In any case, I am glad to see my memory used. I have 16 GB on a Ryzen 2600 (using 11 cores) and 32 GB on a Ryzen 3700x (using 15 cores), and haven't run out yet, though I see over 3 GB used on several of them. Thanks for the warning. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I just got my first work unit suspended "waiting for memory" on my Ryzen 2600 (with 16 GB). There was about 1 GB available. So I will continue on my Ryzen 3700x (32 GB). That should work for the foreseeable future. |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 4805 Credit: 0 RAC: 0 |
These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets. We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/. Great! I couldn't ask for more. I have re-arranged my machines so that Rosetta has plenty of memory. Throw them at us, though I am not surprised if it causes a lot of problems. I hope people check here for what is going on. |
Nick Name Send message Joined: 12 Aug 09 Posts: 3 Credit: 2,487,614 RAC: 0 |
These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets. This is exciting, but these types of jobs should be accompanied by a News notice so that users aren't surprised. I'd also suggest they be put in a special category in project preferences requiring users to explicitly allow tasks this large. Most users are not going to be able to run these without problems. Team USA page | Team USA forum Follow us on Twitter |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 4805 Credit: 0 RAC: 0 |
Agreed, sorry we didn't add a memory requirement for these jobs. |
jringo Send message Joined: 15 Aug 17 Posts: 12 Credit: 2,628,933 RAC: 0 |
We're using BOINC Network to spread the word that R@H is working on corona virus problems. This is the sort of news that would be a great public driver! This news will not only bring cycles from other BOINC projects to yours (likely only temporary -- to solve an immediate and tangible problem -- so don't feel guilty), but would likely bring a significant number of people into the BOINC network at large. Always feel free to reach out if you'd like help getting a PR made up. Good luck on the project! email: boinc.network@gmail.com discord: https://discord.gg/wPRafUq twitter: @BOINCNetwork |
retalaznstyle Send message Joined: 18 Feb 20 Posts: 1 Credit: 0 RAC: 0 |
Hi, mod here from coronavirus subreddit. Do you have a post for new users who want to sign up for the coronavirus research efforts via rosetta? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I'd also suggest they be put in a special category in project preferences requiring users to explicitly allow tasks this large. Yes, exactly. That would allow the use of machines with more memory where they are needed, while the ordinary machines can do the ordinary work. Also, is more capacity needed? Just ask and we will do it, but we need to know what the need is. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,518,540 RAC: 9,764 |
Duplicate... |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,518,540 RAC: 9,764 |
Hi! Can you ask people just to install BOINC and choose Rosetta, and explain the following? There is a huge pool of Rosetta tasks, so if some people were to pull out and run the Coronavirus tasks, the rest of us will just end up running more of the other tasks as that is all that would be left. Does that make sense? Danny |
JP Send message Joined: 18 Feb 20 Posts: 2 Credit: 40,554 RAC: 0 |
Hi Danny, -I am a brand new user to both BOINC & Rosetta. I was brought here from the COVID-19 Reddit post. -I do not have a clear understanding of how tasks are distributed and/or prioritized among users. -If it is possible, I am trying to clarify the best course of action with the most simple set of instructions to communicate to a wide, non-technical, audience on to how to best use Rosetta for COVID-19 related tasks. -I may have misunderstood, but I believe what you are saying is that it is not possible to prioritize particular tasks in Rosetta because a users resources are distributed among many tasks at once. Therefore, people wishing to commit processing resources to COVID-19 tasks should just run Rosetta. In the course of running Rosetta their computing power will be added to the pool working on all tasks - including COVID-19 related jobs. Further, if there were the capability to allocate ones own computing power to specific (COVID-19) tasks it would force the resources of other users to be allocated to other, non-specified (non-COVID19) tasks rendering the power of any task specification moot. -I am running Rosetta now and. unless I missed it, I do not see where I could specify or prioritize particular tasks. It appears that this is not an option anyway. -It seems the best course of action is to simply download BOINC & run Rosetta? Thank you for any clarification you could provide. Best, -JP |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
-I may have misunderstood, but I believe what you are saying is that it is not possible to prioritize particular tasks in Rosetta because a users resources are distributed among many tasks at once. Therefore, people wishing to commit processing resources to COVID-19 tasks should just run Rosetta. In the course of running Rosetta their computing power will be added to the pool working on all tasks - including COVID-19 related jobs. As a long-time Rosetta user, I can answer that. Yes, you just work on the pool of all the tasks. In fact, unless you can figure out their obscure nomenclature, you don't even know which ones are for COVID-19. That is fine with me. It doesn't matter on which machine which particular task is run, as long as they have enough resources. And if they run out of work, then they have more than enough. |
JP Send message Joined: 18 Feb 20 Posts: 2 Credit: 40,554 RAC: 0 |
Thank you Jim1348! Cheers, -JP |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Yes, with the expectation that your contributed effort will benefit research teams that use Rosetta to study COVID-19, as well as other protein structures. Your efforts also benefit the team at University of Washington that is developing improvements to Rosetta, which makes this type of computational structure prediction possible. Rosetta Moderator: Mod.Sense |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,841,158 RAC: 12,408 |
These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets. So it no memory leaks, it just abnormally big (compared to R@H average work) protein model? 1273 amino acid residues if i get it right? Is any work on developing of multi-threaded app for such big targets? To not to waste huge amounts of RAM for complete datasest copy for each working thread. Modern computer getting more and more CPU cores/thread and just running multiples copies on each thread means more and more "overhead" for RAM, Disk and Internet(Bandwidth) usage because use of all of these resources is multiplicates by number of task is running. While multi-threaded app is share all of this and only need multiple CPU/threads. Usual(common) setup for non server computers is about 1 GB of RAM per 1 CPU thread. 2 GB per thread is much more rare cases. And there are almost no "consumer" or "office" or "home" computer with >2 GB RAM per CPU thread. So you can not just throw task which consume >=3 GB of RAM per thread and expect that all will be working OK. There WILL be problems on majority of computer. In other case if there is a multi-threaded app is available then using even 5-10 GB of RAM per single large model will be acceptable for most volunteer computers. Also i will help with runtimes of biggest models on older CPUs - really big models often getting aborted on old(or just slow like Intel Atom or AMD Puma/Jaguar/Bobcat) CPUs by watchdog due to exceeding max allowed runtime (8+4 = 12 hour MAX as default) before very first model/decoy is calculated and CPU time spend is wasted. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Is any work on developing of multi-threaded app for such big targets? To not to waste huge amounts of RAM for complete datasest copy for each working thread. Isn't the Internet bandwidth the same? With multi-threaded you run fewer work units at a time, but you download/upload correspondingly more often. I think the only real saving is memory. Most multi-threaded projects now allow you to select how many threads (cores) you want to use on a single work unit. I usually select "1" or "2", since that is usually more efficient. Most MT projects run less efficiently the more threads you use. I am not sure why that is the case, but it is said that on some of them, one thread may finish early before the others, and have nothing to do. There may be other reasons. I usually have plenty of memory, though having a choice is nice. But I expect that not all tasks are suitable for MT. |
Message boards :
Number crunching :
Huge RAM usage by some of latest WUs
©2024 University of Washington
https://www.bakerlab.org