Message boards : Number crunching : Minirosetta 3.73-3.78
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 14 · Next
Author | Message |
---|---|
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
actually, i'm wondering if limiting the number of concurrent tasks may help. for r@h, i normally see the number of tasks running as one task/thread per core. hence it nicely use all 8 cores with 8 tasks/threads (incl HT cores) of my i7 4771 cpu. i'm running on 16 GB of ram in linux. i've yet to encounter the 'needs xxx MB of RAM' with r@h, but with a different project (atlas@home from cern), the memory requirements are quite huge and i often see only 4 threads / tasks running and hit the memory limit. coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable. you may like to see if disk swap spaces may be somewhat tunable in that respects. the other thing i think has to do with the boinc client itself, i'm thinking an updated or more recent boinc client may possibly resolve some of these issues as what you are seeing is probably a behavior of boinc client rather than r@h |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
[snip] BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,067 RAC: 9,928 |
[snip] The OS (Windows, all variants of Linux, MACOS, ... ) provides the program with VIRTUAL memory. The virtual memory is translated into PHYSICAL memory using the TLB translations. A virtual page of memory can get swapped to disk and then be relocated into a different PHYSICAL memory location by setting the TLB entry properly. The executing program does not even know if the page has been swapped out to disk. The last time I looked, Windows allocated a disk swap file the same size as memory ( C:pagefile.sys ). You can explicitly set the size of this file, even to 0 bytes .... but when you run low on memory, the OS will kill stuff "Out of Memory". Virtualbox is just a program in memory that runs on top of your OS and you set the memory size that virtualbox is allowed to use. I usually set virtualbox to be able to use about 50% of my physical memory BUT I have 16gb or more on my systems. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
[snip] It's hard to get Virtualbox working correctly - once the versions of BOINC available so far have detected that virtualization is not enabled in the BIOS or the UEFI, they will remember this forever and prevent the test of whether it is enabled from being run again. Also, all of the Virtualbox workunits I've seen much about so far seize 4 GB of physical memory, and won't allow any of it to be paged. I'm hoping that a new version of Virtualbox will remove this restriction. As far as I know, Virtualbox can handle 32-bit workunits, but not 64-bit workunits. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,067 RAC: 9,928 |
[snip] I use regularly use Virtualbox to build Linux images on machines and none of my comments were about the pre-configured BOINC VIRTUALBOX implementation. I have no experience with BOINC packaged Virtualbox. I imagine that BOINC projects choose to use the BOINC Virtualbox so they can control the execution environment and quality of data generated very closely. 32-bit only probably makes sense to for BOINC Virtualbox in that case. |
[FI] OIKARINEN Send message Joined: 16 Nov 13 Posts: 6 Credit: 131,483 RAC: 0 |
I've been running the 3.71 version of rosetta for 2 days .. And I just noticed a lot of crashing workunits running on different computers , all of those WUs have this attached : ERROR: unrecognized residue AX1 ERROR:: Exit from: ......srccoreiopdbfile_data.cc line: 2077 BOINC:: Error reading and gzipping output datafile: default.out Life is too short to live concerned about its mysteries. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
It's hard to get Virtualbox working correctly - once the versions of BOINC available so far have detected that virtualization is not enabled in the BIOS or the UEFI, they will remember this forever and prevent the test of whether it is enabled from being run again. They have a solution to that problem in the Cosmology FAQs: I enabled VT-x/AMD-v but jobs say “Scheduler wait: Please upgrade BOINC” Also, all of the Virtualbox workunits I've seen much about so far seize 4 GB of physical memory, and won't allow any of it to be paged. I'm hoping that a new version of Virtualbox will remove this restriction. I think that just depends on the application. ATLAS and vLHC take a lot of memory, but Cosmology does not that I recall. I have had some problems with VirtualBox interfering with some other programs (both CPU and GPU, even non-BOINC ones), but not with the VBox programs themselves. I just use the pre-packaged versions on the CERN projects and Cosmology, but they all went easily enough, though you do need to watch the memory. If VBox would be of any use for Rosetta, I would be willing to try it here. |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
Both of my computers received 24 hour backs after a single request for work resulted in this reply: Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer. When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks. *****Wild Speculation Alert***** If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive. I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment. I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long time? Best, Snags edit: I just saw additional posts in this thread that suggest rosie really did run out of cpu tasks. Ah, well. I suppose I should see if I can find BOINC documentation on the back-off settings (documentation that I could actually understand, that is) : / |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
Both of my computers received 24 hour backs after a single request for work resulted in this reply: When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks. *****Wild Speculation Alert***** If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive. I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment. I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
Both of my computers received 24 hour backs after a single request for work resulted in this reply: When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks. *****Wild Speculation Alert***** If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive. I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment. I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
Both of my computers received 24 hour backs after a single request for work resulted in this reply: I've seen a similar problem twice. I have an Android device in addition to my Windows devices, but so far I have BOINC installed only on the Windows devices. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
Both of my computers received 24 hour backs after a single request for work resulted in this reply: When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks. *****Wild Speculation Alert***** If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive. I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment. I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long |
iriemon Send message Joined: 16 Jan 16 Posts: 6 Credit: 770,637 RAC: 186 |
Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available..... |
iriemon Send message Joined: 16 Jan 16 Posts: 6 Credit: 770,637 RAC: 186 |
Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available..... For some reason, I decided to clear my IE cache and then tried to dl a new work unit and to my surprise IT WORKED! Happily crunching...... |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
|
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available..... I decided to try that on my Windows 10 computer. Surprise - if Windows 10 even includes IE, it is very well hidden. I told BOINC Manager to update for Rosetta@home anyway - it downloaded a workunit. It looks likely that the problem is fixed on the server and IE is not involved. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
801194890 Starting work on structure: _00002 |
ArcSedna Send message Joined: 23 Oct 11 Posts: 14 Credit: 69,190,403 RAC: 19,011 |
Some workunits hang up for long hours until manual termination. They have string like EN_MAP_hyb_cst EN_MAP_cst RE_MAP_hyb_cst RE_MAP_cst in the middle of the name. Sample (Already aborted) Their behavior is 'do nothing for a long time'. Looks like this: Elapsed real time : 32 hours Elapsed cpu time : 15 minutes This is happening on my Mac computers. Windows and Linux seem to be OK. OS : Mac OS X 10.11.3 Boinc : 7.2.42 Memory : 8GB to 16GB Thanks. |
James Adrian Send message Joined: 27 Apr 12 Posts: 5 Credit: 1,801,535 RAC: 0 |
Has anyone else gotten work units for Minirosetta 3.71 that are estimated to run 14 days? I'm running on an old (2009) Mac with 8GB of memory and lately I've gotten these here and there. Thanks Boinc 7.6.22 Mac OS 10.11.4 |
James Adrian Send message Joined: 27 Apr 12 Posts: 5 Credit: 1,801,535 RAC: 0 |
ArcSedna, I just saw your post, once I sorted to see newest first. My problem seems slightly different but like you I see the problem with work units named as in your post. One other observation: I have a newer Mac laptop but so far I have not seen the problem with the work units on it, just on my older iMac. |
Message boards :
Number crunching :
Minirosetta 3.73-3.78
©2024 University of Washington
https://www.bakerlab.org