Message boards : Number crunching : Some WU take a long time
Author | Message |
---|---|
MichaelHe Send message Joined: 2 Oct 09 Posts: 4 Credit: 380,105 RAC: 0 |
Most of my WU take around 3 hours to finish but sometimes it takes 6-10. I've currently got one that's at the 6:30 mark. It appears to be "stuck", updating only once every half minute and only in very small quantities. When I check the credit granted for such WU, it is usually very low. Is this normal? |
MichaelHe Send message Joined: 2 Oct 09 Posts: 4 Credit: 380,105 RAC: 0 |
Also, is it just me or is granted credit consistently lower than claimed credit? For me, 90% of the time my granted credit is lower, and when it's higher it doesn't make up for the times when it's lower. You can check my results at https://boinc.bakerlab.org/rosetta/results.php?userid=352530 |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
This task 334979307 ( rs_stg0_lrlx_t447__boincid_SAVE_ALL_OUT_19714_1598_0 ) eventually validated, producing a single decoy, but the graphics looked strange. I have a screenshot (one protein looks crunched up in a ball) but can't figure out how to upload it. |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
Also, is it just me or is granted credit consistently lower than claimed credit? For me, 90% of the time my granted credit is lower, and when it's higher it doesn't make up for the times when it's lower. You can check my results at https://boinc.bakerlab.org/rosetta/results.php?userid=352530 I have basically asked the same question somewhere in this thread: Credit always low There is some detailed information on how credits are granted near the end of the thread. I have the same 'issue' with my i7. My other computer has a Q9650. This computer usually gets granted what it claims. I wonder if hyper threading is causing the low granted credits. But I dare tuning it off, since it for sure is a benefit at the end of the day. Jochen |
SFCC Send message Joined: 3 Sep 09 Posts: 10 Credit: 227,659 RAC: 0 |
I just aborted a WU that had been running 40+ hours and showed 92 hours remaining! CPU was running at only 13% capacity, so it appears that the WU was just sitting in some "do nothing" loop. I aborted another one that had run 20+ hours with 30+ hours shown as remaining. Normally WU take 3-4 hours on this machine. I had encountered this problem occasionally in the past, but now it happens with most WUs. I was told sometime ago that it is an "occasional" problem with Windoz machine, but now it appears to be quite common on my machine. I have suspended the project until someone can tell me how to fix the problem if it on my end or get it fixed if it is on the project's end of the pipe. I'm running BOINC version 6.10.18 on a 2.0 GHz dual core AMD machine running Windows XP Media Center Edition with SP3. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I just aborted a WU that had been running 40+ hours and showed 92 hours remaining! CPU was running at only 13% capacity, so it appears that the WU was just sitting in some "do nothing" loop. I aborted another one that had run 20+ hours with 30+ hours shown as remaining. Normally WU take 3-4 hours on this machine. I had encountered this problem occasionally in the past, but now it happens with most WUs. I was told sometime ago that it is an "occasional" problem with Windoz machine, but now it appears to be quite common on my machine. I have suspended the project until someone can tell me how to fix the problem if it on my end or get it fixed if it is on the project's end of the pipe. I'm running BOINC version 6.10.18 on a 2.0 GHz dual core AMD machine running Windows XP Media Center Edition with SP3. The workaround for tasks that are 'hanging' on Window (apparently stuck and getting 0% CPU time in the Task Manager) is to Quit and Restart BOINC. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 9,592 |
I just aborted a WU that had been running 40+ hours and showed 92 hours remaining! CPU was running at only 13% capacity, so it appears that the WU was just sitting in some "do nothing" loop. I aborted another one that had run 20+ hours with 30+ hours shown as remaining. Normally WU take 3-4 hours on this machine. I had encountered this problem occasionally in the past, but now it happens with most WUs. I was told sometime ago that it is an "occasional" problem with Windoz machine, but now it appears to be quite common on my machine. I have suspended the project until someone can tell me how to fix the problem if it on my end or get it fixed if it is on the project's end of the pipe. I'm running BOINC version 6.10.18 on a 2.0 GHz dual core AMD machine running Windows XP Media Center Edition with SP3. Newer versions on BOINC Manager suspends processing when CPU usage is 25% or greater by default I believe - make sure the jobs aren't suspended before canceling! |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
Newer versions on BOINC Manager suspends processing when CPU usage is 25% or greater by default I believe - make sure the jobs aren't suspended before canceling! What would be the state displayed in this case? Still 'active'? Jochen |
SFCC Send message Joined: 3 Sep 09 Posts: 10 Credit: 227,659 RAC: 0 |
Newer versions on BOINC Manager suspends processing when CPU usage is 25% or greater by default I believe - make sure the jobs aren't suspended before canceling! BOINC is configured to use 100% of idle computer time. The WU is still running with indicated CPU time increasing. Another post mentions stopping and restarting BOINC to 'cure' the problem - that works sometimes, sometimes not. The proplem with that is that the machine is located 'off site' in our computer club computer room and I administer it remotely from home. Due to time constraints, I don't log into it on a dayly basis so when it gets into this strange mode it can just sit there spinning its wheels and the display goes blank. We are trying to interest our club members to participate in BOINC projects and when they see one of our 'display' machines hung-up, that is NOT good press... So, I have suspended Rosetta and am running other projects. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 9,592 |
Newer versions on BOINC Manager suspends processing when CPU usage is 25% or greater by default I believe - make sure the jobs aren't suspended before canceling! No- they're displayed as 'Suspended- CPU usage too high'. |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
No- they're displayed as 'Suspended- CPU usage too high'. Thanks. I have not seen a WU with this state. But I found a couple of log entries stating 'CPU usage too high, suspending computation' and a second or two later 'Resuming computation'. But these long-running models do have other side effects. I had two WUs yesterday that ran approx. 7 hours (I am runmomg 3 hours WUs). When the first one was done, all other running WUs were 'suspended' and other WUs (all with a later expiration date) were started with the state 'Active, high priority'. I would have guessed, the ones to do first were the ones with the nearest expiration date. It looks like the BOUNC client was trying to find WUs that could be finished in time with the new duration for a single WU given. Bad thing about this is, that I was running out of memory, because it kept the suspended ones in memory (yes I know I could change this, but this would mean to lose some computation time to revert back to latest safe point). I actually can not see the reason for this behaviour. When there is one long running model, the estimated duration is set to this value for all other WUs instantly, but the estimated duration decreases only slowly. I do not like this. Jochen |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
In that version a new option is used. Yes, I found it and set it to 'no restriction'. Thanks Jochen |
[VENETO] M@cro Send message Joined: 23 Jul 09 Posts: 3 Credit: 84,790 RAC: 0 |
Hi folks! Is anybody crunching this Rosetta WU? rs_stg0_lrlx_t459_casp8_SAVE_ALL_OUT_20813_1841_0 Because it starts as an 18hs WU (as I set in prefs) then time-to-complete goes up slowly to 36h, so total time is 53hs.. The other WUs I'm crunching are 18h-long! M@cro - BOINC.Italy |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
M@cro, please check that you are looking at the actual CPU time used by the task, and not the elapsed time. You can do this by going to the advanced view, to the tasks tab, highlight the task you mentioned, and then click the properties button over on the left. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Some WU take a long time
©2024 University of Washington
https://www.bakerlab.org