Message boards : Number crunching : Only 20 credits for 25,000 seconds
Author | Message |
---|---|
Francois Racine Send message Joined: 17 Nov 09 Posts: 9 Credit: 3,658,771 RAC: 0 |
Hello, For a while now I only get 20 credits for tasks that ran for 25,000 seconds. You can see this for task 557922201 (WU 506964570). I have tasks that ran for 20,000 seconds that obtained 90 credits. It seems there is a magic limit that when hit the credits drop. I have 4 machines running Rosetta and they all face the same problem. Thank you for reading this thread. François |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
Here's the key lines from one of your sterr_out: BOINC:: CPU time: 25230.3s, 14400s + 10800s[2013- 1-23 18:46:33:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 25230.3 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish I can't say that I've seen it happen as frequently as you are encountering but I have seen it very rarely on some tasks (hybrid, I believe... the tasks starting with hyb, that also sometimes have the increased run times as you can see above) Just some ideas... Hardware/software issues? An unstable overclock? Frequent reboots? |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,844,006 RAC: 12,287 |
It is just another long known buggy WU series. Almost all of them with names like hyb_.._bench_... As I understand 20 Cr = no any usefull work done in WU. |
Francois Racine Send message Joined: 17 Nov 09 Posts: 9 Credit: 3,658,771 RAC: 0 |
Polian, These 4 computers are all different, with no BIOS options forced (like overclocking). All are running Ubuntu Server 12.04 with the latest BOINC for Linux version (7.0.28). These computers are rebooted once every two-four weeks, when an Ubuntu upgrade requires it. I'm talking about these 4 computers because they all get the problem after running a 25,000 seconds task. Up to know I did not believe the problem could come from the computer/OS. I thought the problem could come from Rosetta or the BOINC program itself. I'm surprised I'm the only one to get the problem since I have multiple, all different computers. Only common point between these computers is the BOINC software version and the OS. Please let me know if you have more specific questions I could answer. Thank you for your help on this. Francois |
trigggl Send message Joined: 20 Apr 09 Posts: 4 Credit: 102,177 RAC: 0 |
These 4 computers are all different, with no BIOS options forced (like overclocking). All are running Ubuntu Server 12.04 with the latest BOINC for Linux version (7.0.28). These computers are rebooted once every two-four weeks, when an Ubuntu upgrade requires it. I'm talking about these 4 computers because they all get the problem after running a 25,000 seconds task. Actually, I was getting this problem on my one host that was still validating. I use Gentoo and 7.0.29 on all of mine. |
Francois Racine Send message Joined: 17 Nov 09 Posts: 9 Credit: 3,658,771 RAC: 0 |
Unfortunately I'm getting more and more of these tasks. If the problem does not get solved I will have to work for another project. It is sad because Rosetta is my favourite but I still want to be appreciated for the effort. |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
Francois Racine Send message Joined: 17 Nov 09 Posts: 9 Credit: 3,658,771 RAC: 0 |
I do not know if the problematic tasks have been removed but all completed tasks of the last 24-48 hours are fine. |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
Problem not solved, they are still on delivery. I'll abort all "bench*IGNORE_THE_REST" tasks before they start I guess. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,134,655 RAC: 4,716 |
I got one too: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=508740380 On the workunits webpage it says: errors Too many error results Two of us tried to crunch it and it failed, this is NOT a user problem!! This has happened on several, BUT NOT NEARLY ALL, of my units and I am using Boinc 7.0.45. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
I got one too: This is one of those "1201 cpu seconds" WUs and both computers got exactly the credit they asked for (and not 20, so wrong thread I guess). . |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,677,186 RAC: 4,532 |
Getting really tired of long-running tasks that award only 20 credits. |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
... |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
same here :-( Not only bench*ignore... are affected, other series have the problem too : rb_02_02_36194_68641__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_73531_121_0 Common part is always Stream information inconsistent. in stderr and only one decoy in the result, plus this warning : WARNING! cannot get file size for default.out.gz: could not open file. so it probably has actually generated nothing at all. Unfortunately all those facts occur after the time has already been wasted so they cannot be used to abort the task before the calculation starts. There is one thing that might help though : OK: Watchdog active. Starting work on structure: _00001 <= *** difference *** # cpu_run_time_pref: 28800 damaged: Watchdog active. # cpu_run_time_pref: 28800 so Starting work is missing completely quite close to the start already |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
so Starting work is missing completely quite close to the start already It is also missing on many WUs not affected by this issue (nothing uncommon actually), here one example from your tasks: 559976267. Or from my tasks: 558392756, 558605706, 558900949, 559432297, 559647034 and 559856339. That's 6 out of 14 tasks currently available in my list, none of them is affected by the 20 credits issue. . |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,677,186 RAC: 4,532 |
Yet another one: task 564377097. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
Even worth :( Whiskey Tango Foxtrot??? 0.77 credits for 11k seconds (and apparently 73 decoys detected only to reset itself...) Any one care to explain? Ralf |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 2,159 |
Even worth :(And another one. First crunching happily and generating 59/59 decoys, then starts over to report another, single one for a mere 0.69 credits... It would really be nice if someone has a reasonable explanation for this or even better, would try to fix this. This is really getting old... :( Ralf |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
This appears to be unrelated to the original problem of this thread. That said, I don't remember an issue like your sterr_out shows being previously reported. Try updating your BOINC core client. Many users are using 7.0.52 (myself included) and it appears stable. I'm wondering if the watchdog is sensing that it is hung and restarts. |
Francois Racine Send message Joined: 17 Nov 09 Posts: 9 Credit: 3,658,771 RAC: 0 |
I'm back. The problem didn't show up for a few weeks but it's back. I now see 3 tasks than ran for 25000 seconds and obtained 20 credits. One of these is task 571259532. Thank you for looking at this. Regards, François |
Message boards :
Number crunching :
Only 20 credits for 25,000 seconds
©2024 University of Washington
https://www.bakerlab.org