Division of Labor

Message boards : Number crunching : Division of Labor

To post messages, you must log in.

AuthorMessage
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 66203 - Posted: 18 May 2010, 23:46:36 UTC

Good Evening -

I have a fairly basic question, not specific to Rosetta@home, but rather a question about how BOINC manages the sharing of CPU resources. I am fairly new to this concept of grid processing and am still trying to how to tune my systems to get the desired results. Frankly, turning the knobs is not giving me the results I expected.

First a little about my current configuration.

At this time I am running running 6 systems on the network. All AMD, almost all quad core processors. I have parts to build several more systems but am scaling up slowly until I get a feel for how fast the old electric meter spins.

All of these systems are running Linux (sorry, no Windows, I am a Linux systems guy by trade and its what I am comfortable with)

All of these systems are running the current BOINC software with the standard non-optimized applications provided by the projects.

I am attached to two projects: Rosetta@home and SETI@home with the preponderance of cycles dedicated to Rosetta@home. I believe that your project is better grounded, has greater potential to impact lives , and frankly, I appreciate the professional manner in which the project is run and the level of communication you maintain with your users.

However, I wanted a second project to provide a flow of work units in the off chance your system went down.

I am currently set up to give 75% of my resources to Rosetta@home with the remaining 25% going to SETI@home. I take the default setting of switching applications every 60 minutes. On both projects I am set up to maintain 1 days worth of work in my queue.

What I would have expected to see on a quad core processor is three cores running Rosetta@home and one core running the SETI@home application. With understanding that this division of labor would be temporarily altered if a project went down and I exhausted my supply of that type of work units. SETI@home has been down hard several times this past week.

On several of my systems I see this pattern, but on others I seem to be doing a 50-50 split on the work types. I will run 4 Rosetta tasks for a while, and then they will go into a waiting state while SETI tasks run.

All SETI tasks in my queue are “fresh” (hey, they've been down a lot) with deadlines at least 10 days away.

Can you provide any insight as to what I am doing wrong?

Thanks
ID: 66203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 66208 - Posted: 19 May 2010, 3:50:00 UTC - in response to Message 66203.  

Can you provide any insight as to what I am doing wrong?

Chris, you're guilty of the same thing many of the rest of us are, expecting a little common sense from BOINC. :)
You're not doing anything wrong. BOINC will make an attempt to satisfy your project prioritization over the long term. Just try to avoid the temptation of micromanaging and it will work on projects back and forth. On a macro scale (weeks to months) the time it spends on Rosetta & SETI should be close to your preferences. Welcome and happy crunching to you!
ID: 66208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 66209 - Posted: 19 May 2010, 4:17:45 UTC

I usually sum it up by suggesting that you look at the ratio of work over the course of 100 hours, not 100 minutes. And by that time it will generally have settled in to match your resource shares.

If you really just want SETI as a backup project, you can get the newest BOINC version and set it to a resource share of zero. This will only call for work from SETI when unable to get work from projects with a non-zero resource share.

If you would like to conserve a little bandwidth both for yourself and the R@h servers, you might consider installing a caching proxy such as squid proxy. That way if any of the required files are the same for tasks on more then one machine (fairly common), you've only downloaded it once.

The one thing that can really kill your power bill is if you are paying for air conditioning where the machines are located.
Rosetta Moderator: Mod.Sense
ID: 66209 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 66231 - Posted: 19 May 2010, 18:59:01 UTC - in response to Message 66209.  

Hey - I live on the Texas Gulf Coast - right on the water - air conditioning is a way of life. I know of what you speak.

Thanks for the response.
ID: 66231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 66232 - Posted: 19 May 2010, 19:02:37 UTC - in response to Message 66208.  

Mr. Overflow -

Gotta love that name. I'll take your word for it at this point and wait a bit more than a few days before I scratch my head.

One further question - does BOINC try to divide resources on a per machine basis or dies it aggregate across all of the systems on an account?

Chris
ID: 66232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 66236 - Posted: 20 May 2010, 3:20:51 UTC

The resource shares are per machine. If you think about it, you might not chose to attach all of the same projects on all of your machines, and so one is really all it can keep straight :)
Rosetta Moderator: Mod.Sense
ID: 66236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,135,082
RAC: 4,703
Message 66242 - Posted: 20 May 2010, 11:30:56 UTC - in response to Message 66232.  

Mr. Overflow -

Gotta love that name. I'll take your word for it at this point and wait a bit more than a few days before I scratch my head.

One further question - does BOINC try to divide resources on a per machine basis or dies it aggregate across all of the systems on an account?

Chris


You have to think about resource share based on all the Boinc Projects out there and there are a ton compared to just a couple of years ago! Here is a website with a list of the Active Distributed Computing Projects, the Boinc ones are noted http://distributedcomputing.info/projects.html.

Some projects have deadlines of 3 days, some have deadlines of YEARS! There is no way a crunching percentage can be maintained on a multi-core pc with a combination like that and still run one project on one cpu core and a different project on a different cpu core. So Boinc does it by trying to conform to your percentage settings over the long run. I mean if you have the project that can take years to finish one unit, ie Climate Prediction, set at 10% of your total, how would Rosetta maintain 90% with only a dual core system? In short it can't, so it does it by swapping projects every 60 minutes and over the long run getting pretty close to your assigned percentage. It can't do it with a quad core system either, or a single core system, the numbers just don't add up. Boinc has a Long Term Debt system that it maintains and a Short Term Debt system too. Boinc is very complicated, and for some seemingly quirky but to be honest it does work, and for the most part quite well! Of course the longer you crunch and the more machines you have, the more situations you will see where Boinc is just quirky.
ID: 66242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 66259 - Posted: 21 May 2010, 16:37:03 UTC

The run-time spread is actually wider than Mikey's post indicates... because he was illustrating deadlines rather than run times ... On my fastest GPU a MW task takes about 90 seconds ... were I to run that same task on the CPU side it would take about 4 hours and change... yet I also run CPDN where the tasks run for about 300 hours ...

RaH also allows you to select a run time ... so ... how do you balance all of that?

There is a complicated rule set that governs which task to run, when to stop, when to run alternative tasks etc. The problem with complicated rule sets is that the outcome is not always what one expects. An additional complication is that the developers don't quite see a blind spot they have developed about the internals of BOINC... that is that the internal operational "model" is still that of a single processing element ... they apply scaling factors to be sure to handle quad and 8 core systems ... but that is not the same thing ... so, on quads and wider we see artifacts of the system not making optimal decisions on what to do when ...

But the key is as has already been stated ... the resource share is honored over time, more or less, ... with the projects you have selected it should be more ...
ID: 66259 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 66263 - Posted: 21 May 2010, 20:07:34 UTC - in response to Message 66259.  

Paul -

Since I am in the process of building a new system centered on the new AMD X6 processor I am a little curious about your comment about "on quads and wider we see artifacts of the system not making optimal decisions on what to do when"

Can you expand on that? Is it something I can expect to see with 6 "real" cores or is it something only seen on an Intel HT system where you have "8 backed by 4"?

The older I get, the more I realize there are a lot of things I don't know ...

Thanks for taking the time to respond to my post.

Chris
ID: 66263 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 66266 - Posted: 21 May 2010, 21:07:00 UTC

If you have SETI as back up, then Id make an account manager such as grid republic, add all my PCs there, and add SETI whenever RaH is down... (I add Docking@Home whenever RaH is down...)

Just my two cents :)
ID: 66266 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 66277 - Posted: 22 May 2010, 13:44:04 UTC - in response to Message 66263.  
Last modified: 22 May 2010, 13:45:19 UTC

Paul -

Since I am in the process of building a new system centered on the new AMD X6 processor I am a little curious about your comment about "on quads and wider we see artifacts of the system not making optimal decisions on what to do when"

Mostly what happens is you will see BOINC moving along and then suddenly "panic" and start to run things in high priority mode... Or to run tasks that have later deadlines first out of a long list instead of running those that have the shortest deadline...

Most people will never see these things for a couple of reasons... first they don't use the BOINC Tasks page as their screen saver (as I do on my second monitor) ... or they do not run enough projects (well over 50% of users run only one project, I think the number for less than 5 covers 80-90% of all participants).

There are other odd things that happen, and they will happen with any Quad or better. Some of them are version dependent, meaning as you change versions, the oddities change some... because as frustrated as I get with UCB, on occasion they actually fix a bug or two that really makes BOINC work oddly ...

The bank teller analogy is the best I can suggest ... in the old days if you had 8 tellers, you had 8 lines and real frustration if you got behind the guy that was counting pennies ... that is why most banks use a single feeder line to feed the 8 tellers so that one long running customer does not hold up a select few that get angry at the bank ...

One of the other more common issues with BOINC is inappropriate queue fill ... I have a queue of 1 day(s) to tide me over Comcast outages which can occur at almost any time... they usually are short, but, I can have an outage that lasts 4-6 hours (one or two times a year) ... Ok, two immediate problems here .. with the GPU being able to run off a full load of MW tasks in about an hour that means that BOINC likely has not queued up enough work to tide me over ... mostly because UCB is wedded to the idea of GPU Strict FIFO rule ... so, BOINC does not cache and run work from multiple GPU only projects well ... (try it and watch closely, it can cache, but rarely has a balanced queue, the best way to illustrate this is attach to MW and Collatz with equal shares and watch as BOINC "lurches" between the two projects on a cycle that is as long as you cache size, one of the reasons my Collatz is higher than MW is because Collatz fills to 150 tasks and MW to only 48 and they run faster; in other words it is easier to get work from Collatz) ...

On W02 which I am watching right now I have a queue that is "filled" with tasks from a couple projects, mostly 11 tasks from RCN each listed at 25 hours ... yet half those tasks are likely to take seconds to minutes only ... because the run time is so variable (same issue on ABC and a couple other projects) ... So, BOINC thinks it has plenty of work on hand ... on my Mac it has 4 CPDN models ... same issue ... it thinks it has plenty of work on hand ... but it really doesn't ... not with 8 cores to keep busy ...

Anyway, it takes long hours of patient watching to see these patterns ... The best way to learn about issues like this is to watch the BOINC Alpha mailing list ...
ID: 66277 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 66284 - Posted: 22 May 2010, 15:32:56 UTC

Side note, there are three modules in BOINC that determine these issues I pontificated about. They are:

RR Sim, Resource Scheduler, and Work Fetch

RR Sim essentially models the computer and the work on it to determine if the right mix of work is on hand and about how long it will take to run. Resource Scheduler drives what gets run when and where and Work Fetch gets new work ... But, though they are conceptually distinct entities they all three interact in the determining what is done and when ...

Back on topic ... W02 for the last hour or two I have been watching is running a GPU Grid task on the Nvidia card and on the ATI card it has been running MW work ... and because I am not getting a full boatload of MW tasks I get 10-12 and run them off, idle the GPU, fetch more and repeat ... so every 15 minutes or so I lose 30-60 seconds of GPU work because BOINC will not pre-fetch enough MW work to prevent the GPU from running dry ... this is one of those effects I talked to ...

It happens more rarely on the CPU side ... but as the man said in the movie "I seen it done..."
ID: 66284 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Division of Labor



©2024 University of Washington
https://www.bakerlab.org