Message boards : Number crunching : WUs estimated time way off to elapsed time
Author | Message |
---|---|
San-Fernando-Valley Send message Joined: 16 Mar 16 Posts: 12 Credit: 143,229 RAC: 0 |
Very annoying: Estimated time at start for each WU is stated as about 4 hours. BUT actual elapsed time is around 1 DAY (24 hours) !!!! On all my six different PCs for all WU done in the last days. ABSOLUTELY unacceptable. I'm sure someone can inform me of what I'm doing wrong OR perhaps misunderstanding (on this project). |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It sounds like you recently changed your R@h runtime preference to 1 day. This is set from the website by clicking on your user name at the upper right corner, and selecting the "Rosetta@home preferences" link. There is a preference for each location (default, home, work, school). You can see which location a given host is associated with by looking at the event log as BOINC Manager starts up. So, either the preference changed, or perhaps the location associated with the host has changed to one with a different runtime preference. Either way, BOINC Manager will learn how long the tasks are taking to complete, and adjust in a couple of days. During the time that the estimates are off, the BOINC Manager's request for new work are off as well. It requests work based on it's current estimated time remaining. To the extent this does not agree with your actual runtimes, the work requests will be off. Rosetta Moderator: Mod.Sense |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,269,494 RAC: 6,685 |
I had a number of very long running tasks I recently had to abort. Several jobs were over 24 hours and others we 3% at 12 hours. These workunits are running on a server dedicated to Rosetta@Home running Linux. Any ideas? Task WorkUnit 931190869 840070369 930920244 839826801 930919923 839826521 930912635 839819841 930986536 839886162 Thx! Paul |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I had a number of very long running tasks I recently had to abort. Several jobs were over 24 hours and others we 3% at 12 hours. Paul, I'm not certain if the BOINC Manager has the same quirk on Linux that it does on Windows. Sometimes on Windows it will show tasks with a state of "running", but they are not actually getting CPU dispatched to them. You could see that using top or other utility to show current most active tasks on your machine, or given the number of CPUs, perhaps easier would be to review the task properties for actual CPU time and confirm the number reported in increasing. If you see some tasks running long and confirm they are not getting CPU time, ending and restarting BOINC Manager seems to reset things. Unfortunately, as with any time you end BOINC Manager and it's tasks, you lose work done since last checkpoint on each active task. If the tasks are getting CPU time and running long, the "watch-dog" should step in and mark them as completed once they go 4 hours (CPU hours, not wall-clock hours) passed your runtime preference. Notes: It appears Paul's runtime preference is set to 8 hours for this host. Rosetta Moderator: Mod.Sense |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,269,494 RAC: 6,685 |
Is BOINC Manager required?? I have BOINC Client set to launch at startup. If I close BOINC Manager and let the tasks continue to run will I avoid the bug? Can BOINC Client send & receive tasks without BOINC Manager?? Just looking for a workaround Thx Thx! Paul |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Is BOINC Manager required?? I have BOINC Client set to launch at startup. If I close BOINC Manager and let the tasks continue to run will I avoid the bug? Can BOINC Client send & receive tasks without BOINC Manager?? Well, I guess you're right, technically all of these logistics are handled in the client, and the manager just presents them and then interacts with the client to effect changes you specify (to suspend a task, perform a scheduler request etc.). So, unfortunately, no, I would not expect any change by not using the manager. Rosetta Moderator: Mod.Sense |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,269,494 RAC: 6,685 |
All I really need some help here. I have at least 5 cores that are idle because of this bug. I have never experienced this in the past. Is there a version of BOINC that does not have this bug? I hate to wait for 8 hours & abort tasks but I can restart BOINC every 8 hours. Please help Thx! Paul |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
All I hadn't remembered until now, but I believe, at least on Windows, the issue where BOINC shows tasks are running but is not giving them CPU is related to the preference for % of CPU. There are two similar preferences there. One is the preference for the number of CPUs to utilize, and the other is the preference for the percentage of CPU time to be running tasks. From what we've seen before, it only seems to be a problem when you use the preference for the percentage of CPU. So the workaround would be to set this to use up to 100% of CPU. And instead, if needed by your environment, use less than 100% of the number of CPUs. Rosetta Moderator: Mod.Sense |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,269,494 RAC: 6,685 |
I think I got it fixed. I aborted several tasks that ran over but I also killed a few instances of minirosetta running but using 0% CPU. When I counted the minirosetta tasks I found 50 of them but I only have 48 cores. I killed the 2 tasks at 0%. So far everything has been back to normal for about 48 hours. Wish I knew exactly what fixed it Thx! Paul |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,269,494 RAC: 6,685 |
I can't find any information on this bug and the problem is back. Can you tell me how to get this reported to the BOINC team? In all the years I have run BOINC I have never seen this before. It is a dedicated cruncher so I have it set to 100% CPU utilization & 100% CPUs. Thx! Paul |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Specifically, the bug I've seen is where the R@h tasks do not get CPU time. From your description of long run-time, that is one explanation. Another is that the specific R@h tasks is running long. This has been seen on some recent protocols being developed. So, if indeed that tasks are not getting CPU time, you could post on the message boards for BOINC here: https://boinc.berkeley.edu/dev/forum_forum.php?id=2 Rosetta Moderator: Mod.Sense |
furukitsune Send message Joined: 19 Mar 16 Posts: 9 Credit: 7,194,306 RAC: 3,364 |
When I recently upgraded to boinc ver. 7.6.33, I started seeing the same problems. Clients using 0% cpu, tasks never finishing, etc. I reverted back to boinc ver 7.2.42 and all problems disappeared. You can see cpu usage in windows resource monitor, memory tab. I did have a problem with cc_config not being reset when I re-installed boinc, so do a clean install. fk |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Estimated time at start for each WU is stated as about 4 hours. They are always inaccurate when you start, but eventually correct themselves. To speed up this correction, you can use an app_config.xml file, placed in the Rosetta project folder. It looks like this: <app_config> <app> <name>minirosetta</name> <fraction_done_exact/> </app> </app_config> I assume you are familiar with the app_config file. If not, just create it in notepad, and save it as an "app_config.xml" file (not as a .txt file). |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,269,494 RAC: 6,685 |
It sounds like 7.2.42 is the fix I need. I am running on Linux so I know how to apt-get but I don't know how to ask for an older version. Is there a way to apt-get a specific version? How do I tell software updates that I don't want updates on that app? Thx! Paul |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,269,494 RAC: 6,685 |
When I look at the processes, they are getting 0% CPU so I think it is a bug in BOINC. Maybe it will be fixed soon. Thx! Paul |
LarryMajor Send message Joined: 1 Apr 16 Posts: 22 Credit: 31,533,212 RAC: 0 |
apt-get install package_name=version is the syntax for it. I'm running Debian, and apt only wants to use 7.6.33 of boinc. boinc-client and boinc manager after I upgraded to kernel 4.9.0 - if that's any help. When you find the versions and packages that work for you - apt-mark hold package_name should keep the system from upgrading them. My machine is very similar to yours and it's been on 7.6.33 for close to a month with no problems, so if there's anything that I can check on it that might help, let me know. |
Message boards :
Number crunching :
WUs estimated time way off to elapsed time
©2024 University of Washington
https://www.bakerlab.org