Message boards : Number crunching : BOINC not requesting work
Author | Message |
---|---|
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
1 PC will not request work ,even if other waiting units and 1 executing suspended. BOINC 7.16.6 on Ubuntu 20.04 newly built. WCG CPU units working OK. Can anyone help interpret and debug. Log extract with work_request_debug on below: 13412 Rosetta@home 16/04/2021 17:42:21 work fetch resumed by user 13413 16/04/2021 17:42:21 [work_fetch] Request work fetch: project work fetch resumed by user 13414 16/04/2021 17:42:22 choose_project(): 1618591342.513777 13415 16/04/2021 17:42:22 [work_fetch] ------- start work fetch state ------- 13416 16/04/2021 17:42:22 [work_fetch] target work buffer: 103680.00 + 8640.00 sec 13417 16/04/2021 17:42:22 [work_fetch] --- project states --- 13418 Rosetta@home 16/04/2021 17:42:22 [work_fetch] REC 0.000 prio -0.000 can request work 13419 Einstein@Home 16/04/2021 17:42:22 [work_fetch] REC 0.168 prio -0.000 can't request work: "no new tasks" requested via Manager 13420 World Community Grid 16/04/2021 17:42:22 [work_fetch] REC 197.084 prio -24.357 can't request work: scheduler RPC backoff (106.08 sec) 13421 WUProp@Home 16/04/2021 17:42:22 [work_fetch] REC 0.001 prio -0.000 can't request work: non CPU intensive 13422 16/04/2021 17:42:22 [work_fetch] --- state for CPU --- 13423 16/04/2021 17:42:22 [work_fetch] shortfall 272807.27 nidle 0.00 saturated 38425.40 busy 0.00 13424 Rosetta@home 16/04/2021 17:42:22 [work_fetch] share 1.000 13425 Einstein@Home 16/04/2021 17:42:22 [work_fetch] share 0.000 13426 World Community Grid 16/04/2021 17:42:22 [work_fetch] share 0.000 13427 16/04/2021 17:42:22 [work_fetch] --- state for Intel GPU --- 13428 16/04/2021 17:42:22 [work_fetch] shortfall 112320.00 nidle 1.00 saturated 0.00 busy 0.00 13429 Rosetta@home 16/04/2021 17:42:22 [work_fetch] share 0.000 no applications 13430 Einstein@Home 16/04/2021 17:42:22 [work_fetch] share 0.000 13431 World Community Grid 16/04/2021 17:42:22 [work_fetch] share 0.000 13432 16/04/2021 17:42:22 [work_fetch] ------- end work fetch state ------- 13433 Rosetta@home 16/04/2021 17:42:22 choose_project: scanning 13434 Rosetta@home 16/04/2021 17:42:22 can fetch CPU 13435 Rosetta@home 16/04/2021 17:42:22 can't fetch Intel GPU: no applications 13436 Einstein@Home 16/04/2021 17:42:22 choose_project: scanning 13437 Einstein@Home 16/04/2021 17:42:22 skip: "no new tasks" requested via Manager 13438 WUProp@Home 16/04/2021 17:42:22 choose_project: scanning 13439 WUProp@Home 16/04/2021 17:42:22 skip: non CPU intensive 13440 World Community Grid 16/04/2021 17:42:22 choose_project: scanning 13441 World Community Grid 16/04/2021 17:42:22 skip: scheduler RPC backoff 13442 16/04/2021 17:42:22 [work_fetch] No project chosen for work fetch 13443 Rosetta@home 16/04/2021 17:42:25 update requested by user 13444 16/04/2021 17:42:25 [work_fetch] Request work fetch: project updated by user 13445 Rosetta@home 16/04/2021 17:42:27 piggyback_work_request() 13446 16/04/2021 17:42:27 [work_fetch] ------- start work fetch state ------- 13447 16/04/2021 17:42:27 [work_fetch] target work buffer: 103680.00 + 8640.00 sec 13448 16/04/2021 17:42:27 [work_fetch] --- project states --- 13449 Rosetta@home 16/04/2021 17:42:27 [work_fetch] REC 0.000 prio -0.000 can request work 13450 Einstein@Home 16/04/2021 17:42:27 [work_fetch] REC 0.168 prio -0.000 can't request work: "no new tasks" requested via Manager 13451 World Community Grid 16/04/2021 17:42:27 [work_fetch] REC 197.084 prio -24.357 can't request work: scheduler RPC backoff (101.04 sec) 13452 WUProp@Home 16/04/2021 17:42:27 [work_fetch] REC 0.001 prio -0.000 can't request work: non CPU intensive 13453 16/04/2021 17:42:27 [work_fetch] --- state for CPU --- 13454 16/04/2021 17:42:27 [work_fetch] shortfall 272837.98 nidle 0.00 saturated 38420.16 busy 0.00 13455 Rosetta@home 16/04/2021 17:42:27 [work_fetch] share 1.000 13456 Einstein@Home 16/04/2021 17:42:27 [work_fetch] share 0.000 13457 World Community Grid 16/04/2021 17:42:27 [work_fetch] share 0.000 13458 16/04/2021 17:42:27 [work_fetch] --- state for Intel GPU --- 13459 16/04/2021 17:42:27 [work_fetch] shortfall 112320.00 nidle 1.00 saturated 0.00 busy 0.00 13460 Rosetta@home 16/04/2021 17:42:27 [work_fetch] share 0.000 no applications 13461 Einstein@Home 16/04/2021 17:42:27 [work_fetch] share 0.000 13462 World Community Grid 16/04/2021 17:42:27 [work_fetch] share 0.000 13463 16/04/2021 17:42:27 [work_fetch] ------- end work fetch state ------- 13464 Rosetta@home 16/04/2021 17:42:27 piggyback: resource CPU 13465 Rosetta@home 16/04/2021 17:42:27 [work_fetch] using MC shortfall 0.000000 instead of shortfall 272837.976021 13466 Rosetta@home 16/04/2021 17:42:27 [work_fetch] set_request() for CPU: ninst 4 nused_total 0.00 nidle_now 0.00 fetch share 1.00 req_inst 0.00 req_secs 0.00 13467 Rosetta@home 16/04/2021 17:42:27 piggyback: resource Intel GPU 13468 Rosetta@home 16/04/2021 17:42:27 piggyback: can't fetch Intel GPU: no applications 13469 Rosetta@home 16/04/2021 17:42:27 [work_fetch] request: CPU (0.00 sec, 0.00 inst) Intel GPU (0.00 sec, 0.00 inst) 13470 Rosetta@home 16/04/2021 17:42:27 Sending scheduler request: Requested by user. 13471 Rosetta@home 16/04/2021 17:42:27 Not requesting tasks: don't need (CPU: ; Intel GPU: ) 13472 Rosetta@home 16/04/2021 17:42:30 Scheduler request completed 13473 Rosetta@home 16/04/2021 17:42:30 Project requested delay of 31 seconds 13474 16/04/2021 17:42:30 [work_fetch] Request work fetch: RPC complete 13475 Rosetta@home 16/04/2021 17:42:30 work fetch suspended by user Paul. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,162,382 RAC: 4,112 |
1 PC will not request work ,even if other waiting units and 1 executing suspended. A couple of things first your cache size on the pc come into play which that list says nothing about ie the 'store at least___days of work' and the 'store up to an additional ___days of work' if you already have enough work from your other projects then Rosetta can't send you any more work or your cache would overfill. The other thing is you seem to be using a zero resource share for alot of projects and a ONE for Rosetta, there's not a whole of wiggle room there so I would bump up Rosetta to 100 and then you should get Rosetta work almost everytime they have work for your pc again depending on your cache sizes. |
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
Thanks Mikey. Work share is 900 WCG and 100 Rosetta with 1.2 plus .1 days work. Currently about .5 day loaded as WCG units limited by number to 8. Also I tried suspending waiting WCG units and 1 running unit and still not requesting. Paul. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,800,088 RAC: 22,741 |
1 PC will not request work ,even if other waiting units and 1 executing suspended.BOINC will not request work for a project when a Task is suspended because it has no way of knowing when it will be unsuspended, and no way of knowing if that Task can be returned in time, let alone any new work. With the number of projects you are running, the large size of your cache, and the limited number of cores/threads your systems have, and the Resource share settings you are using, it's very unlikely that Rosetta will be running on your systems at all times. With the number of projects you are attached to you would be better off with no cache at all. I'd suggest 0.01 days and 0.0 additional days. If you wish to do more Rosetta work, you need to increase it's Resource Share (from memory the largest possible value is 1000), or reduce the Resource Share of the others (WCG doesn't follow the usual BOINC method for adjusting this, so i've no idea how you would accomplish that). Keep in mind Resource share is a ratio- not a percentage. The other problem is a Rosetta one- for some time now new Tasks have been misconfigured to require way more RAM than they actually need to run, resulting in many system with smaller amounts of system RAM no longer being able to get work, or only being able to get a limited number of Tasks, even if they have plenty of available cores/threads. Several of your systems fall in to this category. Increasing the amount of RAM available to BOINC may help. In your account, Computing preferences, Memory When computer is in use, use at most 95 % When computer is not in use, use at most 95 % Leave non-GPU tasks in memory while suspended Leave unselected. Page/swap file: use at most 75 %But until the researchers sending out these mis-configured Tasks fix it at their end, it's going to remain a problem. Also your most recently added system needs to have the BOINC Benchmarks run in order for it to receive a reasonable amount of Credit for work done; it's still using the default values. Grant Darwin NT |
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
Thanks Grant. Only tasks I have are 8 WCG, 4 running so I tried suspending those to force Rosetta to fetch. System has been running for several days, just forced a benchmark to be sure, still won't request. Other systems request and get some units but often see memory or disk message as modest PCs. Paul. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,800,088 RAC: 22,741 |
Only tasks I have are 8 WCG, 4 running so I tried suspending those to force Rosetta to fetch.The default Runtime for Rosetta work is 8 hours, the deadlines are 3 days. With that many WCG tasks loaded up, with only 4 cores/threads, it's not going to request more work until it can be sure it will be able to return it in time, and it won't request more Seti work if has to complete more work for other projects in order to meet you Resource Share settings. All it is doing is trying to meet the the settings you have made. The smaller your cache, and the less projects you run, the sooner your Resource Share settings will be met (as in weeks). The larger the cache & the more projects, the longer it will take (as in months). Micromanaging this actually makes them worse. Set your cache to zero, change you Resource Share settings to favour Rosetta, then once it has cleared the present backlog of WCG work it will (when it can) start doing more Rosetta work. Just let it do it's thing once you have set your preferences. The fact that there are issues with the configured requirements for Rosetta Tasks at present is just gong to result in less Rosetta work being done at times, then times when it's mostly Rosetta work. But over time, your Resource Share settings will be met. But whether that time frame is weeks or many months, depends on the settings you choose & whether or you you micromanage things. Grant Darwin NT |
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
Sorted. I had edited app_config to limit Rosetta to 1 task running with BoincTasks. This was OK on earlier versions of BOINC but this version silently failed. Paul. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,800,088 RAC: 22,741 |
Sorted.Yeah, i guess that would do it. Although i can't see any sign in the work request logs where it mentions such a limit. The only thing is this line 13465 Rosetta@home 16/04/2021 17:42:27 [work_fetch] using MC shortfall 0.000000 instead of shortfall 272837.976021Where it replaces the 272837.976021 shortfall with a value of 0, ie no shortfall. So no need for new work. Grant Darwin NT |
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
It was not clear to me in the logs but checking showed app_config entries with GPU sections added and mangled. That was OK in previous BOINC versions for CPU projects but bad format caused 7.16.6 to not request work. |
Message boards :
Number crunching :
BOINC not requesting work
©2024 University of Washington
https://www.bakerlab.org