Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 9 · Next
Author | Message |
---|---|
spRocket Send message Joined: 23 Mar 20 Posts: 22 Credit: 3,008,018 RAC: 0 |
I think I've picked up a 4 GB work unit on one of my systems - it has 8 GB RAM, but at the moment, only a single task is running, and I haven't touched its settings. The "top" command shows a resident size of 2.874GB. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,625,551 RAC: 6,845 |
We only have a few top notch 32+ cores machines with beefy GPUs around the world,Try hundreds of thousands, at the least. Yeap, see here |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
Yes, I agree that we should not crunch on everything. I meant to say on every computer where it is worth it, as per my thread "The most efficient cruncher rig possible". Sorry this part of the sentence got lost - I had to retype this message because no drafts are saved on this forum. We should do exact computations on this, but my gut feeling is that crunching on normal, non-extreme, non-server hardware can be at least somewhat efficient if it is:
- more recent than 10 years underclocked - more recent than 10 years portable
|
allen Send message Joined: 14 Apr 20 Posts: 1 Credit: 61,472 RAC: 0 |
Hello all: I'm new here and am wondering how Rosetta determines the amount of wu's to send each computer. The reason I ask is because I have had wu's cancelled before they are finished since they ran out of time. I have a system that is receiving 8 hour wu's that are continuously taking over 24 hours to run. Hopefully one of you will fill me in on what's happening here. Thanks a bunch, Allen |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
I've processed the CPU list table from the above post. Because the sum is much less than the one on the homepage, I think this may include any registered member on the project, not only the active members. Also note that HT CPU's are overrated at least 50% in the total stats (they simply multiply thread count by per-thread flops). As HT is much more prominent at the high end than the low end (envision Celerons/Pentiums), this skews the stats even more towards the right. 21428.9 TFlops;97.8928 GFlops/host mean;218902 host 20.34 GFlops/host median 64915 < 5 GFlops 10834 < 10 GFlops 19593 < 15 GFlops 12726 < 20 GFlops 11298 < 25 GFlops 10666 < 30 GFlops 8273 < 35 GFlops 5766 < 40 GFlops 1993 < 45 GFlops 2626 < 50 GFlops 1451 < 55 GFlops 3696 < 60 GFlops 2406 < 65 GFlops 1783 < 70 GFlops 1363 < 75 GFlops 1437 < 80 GFlops 2547 < 85 GFlops 1959 < 90 GFlops 4437 < 95 GFlops 198 < 100 GFlops 332 < 105 GFlops 133 < 110 GFlops 22 < 115 GFlops 298 < 120 GFlops 28904 < 125 GFlops 102 < 135 GFlops 452 < 140 GFlops 404 < 145 GFlops 228 < 150 GFlops 22 < 160 GFlops 355 < 165 GFlops 14 < 175 GFlops 15 < 180 GFlops 23 < 195 GFlops 21 < 200 GFlops 20 < 205 GFlops 20 < 210 GFlops 11 < 215 GFlops 19 < 220 GFlops 16 < 225 GFlops 174 < 245 GFlops 126 < 250 GFlops 12 < 275 GFlops 19 < 290 GFlops 30 < 315 GFlops 55 < 335 GFlops 135 < 380 GFlops 14 < 405 GFlops 11 < 630 GFlops 47 < 645 GFlops 16686 < 830 GFlops |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
I think your question is off-topic here, but let me give a TL;DR. I can see under your account that you have dozens of in progress WU's. Please visit computing preferences under your account and reduce your store at least ... and store up to additional ... values. They should probably sum to be less than 1 day, even down to 0.1+0.1days during debugging while BOINC is learning your processing rate. According to this task, it indeed took 24 hours of CPU to complete 195 decoys: https://boinc.bakerlab.org/rosetta/result.php?resultid=1153332354 Please double check the target CPU runtime in your Rosetta@home preferences under your account. It defaults to 8 hours, although 24 hours should be still doable. Deadlines are around 3 days I think. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1684 Credit: 17,933,837 RAC: 22,604 |
I'm new here and am wondering how Rosetta determines the amount of wu's to send each computer. The reason I ask is because I have had wu's cancelled before they are finished since they ran out of time.It's pretty much the same for all projects- they send a rough Estimate of how long it thinks it will take your system to return work. But since you're new to the project, it doesn't have any history for work done, and so that estimate can be way off. Since you are running more than 1 project, you would be much better off with no cache at all. At the very least, an extremely small one. On the top of this page, click on your name at the top right, then in your Account, under Preferences, When and how BOINC uses your computer, click on "Computing preferences." Down the bottom is a link to Edit. Computing Usage limits Use at most 100% of the CPUs Use at most 100% of CPU time When to suspend Suspend when computer is on battery (not selected) Suspend when computer is in use (not selected) Suspend GPU computing when computer is in use (not selected) 'In use' means mouse/keyboard input in last 3 minutes Suspend when no mouse/keyboard input in last --- minutes Suspend when non-BOINC CPU usage is above --- % Compute only between --- Other Store at least 0.1 days of work Store up to an additional 0.02 days of work Switch between tasks every 60 minutes Request tasks to checkpoint at most every 60 seconds Disk Use no more than 20 GB Leave at least 2 GB free Use no more than 60 % of total Memory When computer is in use, use at most 95 % When computer is not in use, use at most 95 % Leave non-GPU tasks in memory while suspended (not selected) Page/swap file: use at most 75 %Click on "Update changes." In the BOINC Manager, View, Advanced. Select Rosetta in the Project tab, then update. those changes will then take effect. See how those settings go, particularly the Other settings. I have a system that is receiving 8 hour wu's that are continuously taking over 24 hours to run.In your account, Preferences for this project click on "Rosetta@home preferences" Set the Target CPU run time to "not selected" and Update to save them. That way it will use the default which is presently 8 hours*. Any currently running tasks will use the old value, any non-running Tasks will use the new value when they start (once the Manager has contacted the Scheduler, or you have pressed Update in the Manager). * Some Tasks will run longer than their Target CPU Runtime. They are able to run for up to 10 more hours, after which time the Wacthdog timer will end the Task. Grant Darwin NT |
Michael E.@ team Carl Sagan Send message Joined: 5 Apr 08 Posts: 16 Credit: 1,942,656 RAC: 781 |
I use a lot of BOINC projects. PrimeGrid applies a bonus for long-running tasks because most people like short-running tasks. For example, looking at CPU-only tasks: Subprojects with a 10% long job credit bonus have recent average CPU time of 41:29:00 and 60:40:12 hours Subprojects with a 20% long job credit bonus have a recent average CPU time of 107[/list]:29:32 and 125:37:06 hours Other subprojects with longer run-times have long job and conjecture bonuses. To see details, create a PrimeGrid account and choose Your Account > PrimeGrid Preferences. Or send me a message and ask for a text/screen cap. The preferences also show completion times. I used to choose projects in part by measuring the points per CPU hour to find those with a high reward. Now I am concerned about medical science more than points. |
RME Send message Joined: 4 Mar 20 Posts: 12 Credit: 1,211,010 RAC: 0 |
I can't wait to get to 1,000,000 points so I can get my reward. |
teacup_DPC Send message Joined: 3 Apr 20 Posts: 6 Credit: 2,744,282 RAC: 0 |
Just to nuance this, I know people getting their old phones from below a layer of dust out of the chest of drawers and setting them to work. As I've understood they only can be functional with their display turned off, so I doubt if that very phone is available for normal use at all.but if we contributed every phone, tablet and low-mid end office machine, typically with 2-4 cores, our computing capacity could increase by orders of magnitude. (I.e., we have way less than a million hosts and there exist billions of personal computing devices in the world)For as many of of those devices there are, many are of such low capability they are of no use to many projects. And you need to keep in mind efficiency isn't actually about low peak or maximum power use- it is about energy used over time to complete a task.I read your point, and it sounds logical, but that coin has two sides. Phone hardware is tailored as well to be super efficient, while continuously needs to be on battery use. Desktop hardware does not necessarily has this efficiency pedigree, though large steps have been made miniaturizing the processor circuits. This phone sideline is a bit off topic perhaps, I admit. But your remark made me a bit curious, I need to search somewhere an GFLOP/W ratio or so. Maybe you're completely right after all, I only caught myself on the thought I was not able to quantify your argumentation. I think an interesting topic in itself. But no need marginalize our beloved Behemoth machines. I am always impressed what their work throughput is in my team (Dutch Power Cows), saliva dripping from the corners of my mouth looking at those numbers. My older i5 and i7 processors stand their ground, but they are from another order. Independent from this 4GB discussion my next processor becomes a big Ryzen, that's for sure. Behemoths and more potent desktops will always remain a pillar in the capacity of distributed computing. Rosetta is stretching herself by trying to meet the phone clients and the potent desktop client with those 4GB jobs. If support of phones proves to be a long term investment time needs to learn, but there are a lot of (old) phones out there, and they represent a huge capacity. That is tried to harvest this I can fully understand. (sorry, a bit off topic i fear) |
Tom M Send message Joined: 20 Jun 17 Posts: 87 Credit: 15,285,842 RAC: 40,786 |
I expect you do not want to end up with a bias toward 1GB or 4GB jobs, while both are needed. For the clients that can handle the 4GB jobs the bias should be neutral. Unless you expect a tendency towards more 4GB jobs with respect to 1 GB jobs, or the other way around, then you want a bias.That's the thinking. +1 Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
I've already answered some of your questions above regarding efficiency and whatnot: - https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13833&postid=95140 If battery use is an issue for you, see also:
- Running Rosetta on Raspberry Pi 3B+ (how to guide) - How to Recycle Android Phones for BOINC or Folding Rig Without Using Batteries, also runs on Amlogic Smart TV Boxes
|
Ged Send message Joined: 17 Apr 06 Posts: 2 Credit: 1,034,115 RAC: 0 |
For me, personally, I'm not driven by the credits granted for running work units; It's about contributing to the science, either by running work units which model a particular behaviour or sheer crunching of data for further treatment or research candidate selection/rejection. Mod.Sense Not to control the deadline of received WUs nor only accepting 8-day deadline WUs, it's more the latter case but some means to ensure that a WU has a realistic deadline for a given WU's payload. |
teacup_DPC Send message Joined: 3 Apr 20 Posts: 6 Credit: 2,744,282 RAC: 0 |
Hi sangaku I found your https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13791#94266 thread, read some of its first posts. I liked the questioning approach of it , and will direct responses concerning what hardware to use in that topic. Your Raspberry Pi 4 remark did set me thinking. Without doing the math I got a vision of a stack of these things, each taking 2 or 3 threads. Being a Dutch my financial domain, as yours, is Euros, and a Pi4 can be fetched in Holland for around 50-60 Euro's. Storage and PSU for all those Pi's should be approached in some clever combined way. First will completely read that topic now, probably the math will not add up, making a Pi 4 a no go. But only fantasizing about that pile of Pi's made my morning a good one, though it probably was not the aim of your post :|. Thanks! |
Millenium Send message Joined: 20 Sep 05 Posts: 68 Credit: 184,283 RAC: 0 |
I don't really care about credits, as long as they are consistent so we can use them to judge the performance of different computers it's fine. Instead the main problem for WUs whose models take too much time, is the checkpointing. Shutting down a PC and losing 6 hours of work isn't good. To solve this problem, if of course changing how checkpointing works, a good idea is to let us choose if we want to get these WUs where checkpointing is problematic. If someone keeps his pc running 24/24 then they can get these WUs without problems. If instead someone shut it down every day then it's better to avoid them. Sure, if checkpointing can be changed to save the progress no matter if a model is completed or not then no problem. |
lazyacevw Send message Joined: 18 Mar 20 Posts: 12 Credit: 93,576,463 RAC: 0 |
My question about credits is, what is up with this guy? Within 3 days, he has the top three "fastest" computers by nearly a factor of 6. [/img] |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1684 Credit: 17,933,837 RAC: 22,604 |
My question about credits is, what is up with this guy? Within 3 days, he has the top three "fastest" computers by nearly a factor of 6.They are returning a lot of Tasks for such a small number of core/threads. 0.72 day turn around. 8 hour runtime. 4,600 Tasks in progress on one system, over 6000 Valid. 0.72 day turn around. 8 hour runtime. 1,300 Tasks in progress on the others, roughly 1,650 each Valid on the others. Number of times client has contacted the server, 3 for one system. 0 for the others? Some sort of CPU compute cluster feeding it's results through those host IDs? Grant Darwin NT |
strongboes Send message Joined: 3 Mar 20 Posts: 27 Credit: 5,394,270 RAC: 0 |
Yes, pretty clearly using those hosts to somehow feed work to other cpus. Very clever but clearly not actually the top cpu. Does anyone know how he can do that out of interest? |
Millenium Send message Joined: 20 Sep 05 Posts: 68 Credit: 184,283 RAC: 0 |
Over 750.000 RAC with a single computer? Even a dual EPYC 7702 computer has no way to get such a high RAC. And his pc seems to have a single EPYC 7702P |
[DPC]_Fatal_Error_Group~Bubbles Send message Joined: 17 Mar 06 Posts: 1 Credit: 382,602 RAC: 0 |
Someone from DPC over here: we've notified the guy running the Nifhack account of this thread and asked if he wants, and is able to, clarify this. He's know for having access to huge amounts of computational power (at work, I believe) but can't deploy all of it all the time. He's also known to rarely part with specifics. My guess is as well those machines are indeed some sort of hosts to the computers behind. |
Message boards :
Number crunching :
Tells us your thoughts on granting credit for large protein, long-running tasks
©2024 University of Washington
https://www.bakerlab.org