Message boards : Number crunching : minirosetta 2.05
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next
Author | Message |
---|---|
Admin Send message Joined: 13 Apr 07 Posts: 42 Credit: 260,782 RAC: 0 |
Seems if i give it some time it finds the protean structure again it was quite strange. Also I wanted to give a headsup that im having a huge issue with the boinc_filtered_lookbuild_threading WU's. Most of the new ones i have received have stalled at about 5 percent and ive had to abort. Are we any closer to fixing this issue because it seems to be getting worse. Ill give you some info on my current one though: protein: t385, cpu time at last checkpoint 33:20, cpu time: 34:24, elapsed time 14:21:01. |
Admin Send message Joined: 13 Apr 07 Posts: 42 Credit: 260,782 RAC: 0 |
Access Violation Error - lr15clusfa_opt_.1bgf.1bgf.IGNORE_THE_REST.c.85.0.pdb.pdb.JOB_17562_3_0 Link: https://boinc.bakerlab.org/rosetta/result.php?resultid=315684378 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x006D2D46 read attempt to address 0x00000000 Debug info in link as usual - Wingman also had same error |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one errored on Ubuntu x64 after 10sec. lr15clusfa_opt_.1hz6.1hz6.IGNORE_THE_REST.c.3.21.pdb.pdb.JOB_17586_2_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=288116061 <core_client_version>6.2.14</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> Watchdog active. SIGSEGV: segmentation violation Stack trace (8 frames): [0x96c49b3] [0x96ee888] [0xffffe500] [0x80a8721] [0x808fcc1] [0x804985f] [0x974c15c] [0x8048121] Exiting... </stderr_txt> |
Max DesGeorges Send message Joined: 1 Oct 05 Posts: 35 Credit: 942,527 RAC: 0 |
I have an “igfhum ProteinInterfaceDesign” that takes from 280 MB up to 1,2 GB of memory!! Is that normal? If it is, I think that in future most of the people will not have enough memory to run Rosetta anymore… |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Another error after 10sec. lr15clusfa_opt_.1bgf.1bgf.IGNORE_THE_REST.c.12.1.pdb.pdb.JOB_17562_9_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=288315508 <core_client_version>6.2.14</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> Watchdog active. SIGSEGV: segmentation violation Stack trace (8 frames): [0x96c49b3] [0x96ee888] [0xffffe500] [0x80a8721] [0x808fcc1] [0x804985f] [0x974c15c] [0x8048121] Exiting... </stderr_txt> |
Max DesGeorges Send message Joined: 1 Oct 05 Posts: 35 Credit: 942,527 RAC: 0 |
I have an “igfhum ProteinInterfaceDesign” that takes from 280 MB up to 1,2 GB of memory!! Is that normal? The name of the WU is: igfhum_looprefine_placestub2_2dsrI_1B6E_ProteinInterfaceDesign_2Feb2010¬_17660_331_0 After 45 minutes I restarted BOINC and the WU restarted from zero. Now, after 2 hours, the properties show me that the CPU time after checkpoint is still without any number (“---“), like the WU has worked for a few minutes. Looking at the task manager it seems that the WU asks continuosly more memory, until it reaches the limit set in the preferences. Then it decreases rapidly to 280 MB and again increases up to around 1,2 GB. Vista 32 bit, Core Duo T7250, 2 GB DDR2, BOINC 6.10.29 |
Max DesGeorges Send message Joined: 1 Oct 05 Posts: 35 Credit: 942,527 RAC: 0 |
I have an “igfhum ProteinInterfaceDesign” that takes from 280 MB up to 1,2 GB of memory!! Is that normal? UPDATE: The WU finished without errors. Looking at the graphic I noticed that when the WU freeze in the “request memory loop”, it was always in the “kic_refine_r2” stage and the accepted energy didn’t vary. I hope this info are useful. :) |
johndad5 Send message Joined: 12 Aug 09 Posts: 7 Credit: 2,729,604 RAC: 0 |
This app update includes a fix for checkpointing. For some reason I am not getting new work. When I update the project it simply says "Not reporting or requesting tasks". I am using BOINC version 6.10.18 . |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
For some reason I am not getting new work. When I update the project it simply says "Not reporting or requesting tasks". I am using BOINC version 6.10.18 . John, it sounds like BOINC has decided to schedule work from other projects for the nearterm on your machine. It is trying to run within the resource shares between projects that you have established. It's normal, and once some work for the other projects has been done, it will come back and ask work from Rosetta automatically. Rosetta Moderator: Mod.Sense |
Neo2 Send message Joined: 3 Feb 10 Posts: 2 Credit: 811,111 RAC: 117 |
Hi! I don't know if this happened also with older versions of rosetta since I started computing on the 3rd of February. I'm running on an amd64 linux system, a pretty powerful one. Looking at my tasks log, I had about a 120 WUs assigned until today, but only 3-4 of them completed successfully. Others show "Outcome - Client error" / "Client state - Compute error". Looking at boinc.log gave me no information because it doesn't contain any error line except "output file .... absent", which I'm told from the FAQ it is safe to ignore. I'm running lhc, seti, milkyway, einstein, ralph, cosmology and with the exception of einstein tasks which seem to end up in computation errors also, every other program is running fine. Milkyway in particular granted me 2500 credits in the last four days (from which I assume the machine is stable). I have never observed problems with the machine itself (occasional lockups, strange sudden shutdowns etc). This is the /proc/cpuinfo file (I've omitted the other 3 cores): # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 920 Processor stepping : 2 cpu MHz : 2800.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt bogomips : 5619.47 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate The machine is equipped with 8Gb of RAM. Everything is running at stock speed, I'm not overclocking. If any other information is needed I can provide it and I'm not scared to do some debugging. :) Thanks Neo2 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Neo2, thanks for joining Rosetta. I see you have two machines. The 4 core that you described is here. And at present, it doesn't show any successfully completed work units. If you look at the task details for that host, such as this one], they each report an error opening a file. The file name seems to vary with each task. This implies a security setup problem on your machine. The executable and the user that is running the BOINC core client, need authority to the files that are downloaded. Is it possible your BOINC installation is conflicting with some anti-virus software? Or other security measures? Rosetta Moderator: Mod.Sense |
Neo2 Send message Joined: 3 Feb 10 Posts: 2 Credit: 811,111 RAC: 117 |
I don't think so, currently I have clamd up and running, but is only a daemon to fulfill requests from userspace programs, not a real-time antivirus software. I'm running 2.6.33 git kernel, without any extra security measures: no grsecurity, no firewall, no external security hooks of any sort, no SElinux. The directory in which BOINC runs is owned by user and group boinc, both existing, no file in the directory is owned by other users. Every file (except executables which have 0755) has got permission 0644 while the directories have 0755. The BOINC executable runs with boinc:boinc also. Before starting BOINC for the first time I tuned the directory parameters, so every file in the BOINC directory has been created by BOINC itself. Gentoo by default installs a stock /etc/conf.d file through which the BOINC service is started. I only modified the paths for data storage and logging, nothing else. The file is the following: # Config file for /etc/init.d/boinc # Owner of BOINC process (must be existing) USER="boinc" GROUP="boinc" # Directory with runtime data: Work units, project binaries, user info etc. RUNTIMEDIR="/mnt/storage/boinc" # Location of the boinc command line binary BOINCBIN="/usr/bin/boinc_client" # Logfile (/dev/null for nowhere) LOGFILE="/mnt/storage/boinc/boinc.log" # Allow remote gui RPC yes or no ALLOW_REMOTE_RPC="yes" # nice level NICELEVEL="17" # scheduling parameters, arguments to chrt(1) SCHED_PARAM="--batch 0" # Relative CPU allocation for boinc user, default is 1024, # requires CONFIG_FAIR_GROUP_SCHED and CONFIG_USER_SCHED, # see /usr/src/linux/Documentation/scheduler/sched-design-CFS.txt CPU_SHARE="768" Now I'm a bit disappointed. Would the manual removal of the rosetta files and the re-sync with the project be of any use? |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This errored after 47min. igfhum_looprefine_placestub2_2dsrI_1P6F_ProteinInterfaceDesign_2Feb2010_17660_271_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=288505225 <core_client_version>6.2.14</core_client_version> <![CDATA[ <message> Maximum memory exceeded </message> Mon 08 Feb 2010 22:14:56 EST|rosetta@home|Aborting task igfhum_looprefine_placestub2_2dsrI_1P6F_ProteinInterfaceDesign_2Feb2010_17660_271_0: exceeded memory limit 918.79MB > 909.78MB Mon 08 Feb 2010 22:14:59 EST|rosetta@home|Output file igfhum_looprefine_placestub2_2dsrI_1P6F_ProteinInterfaceDesign_2Feb2010_17660_271_0_0 for task absent |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Would the manual removal of the rosetta files and the re-sync with the project be of any use? ...doubtful. I would have suggested that if I felt it stood a good chance of helping your situation. But it can't hurt anything (costs you some bandwidth to reload everything). Now that I think about it, if security setup were the problem, you should have same issue with other projects. Anyone else have any ideas why Linux would be unable to open an application file? Rosetta Moderator: Mod.Sense |
jcorn Send message Joined: 27 Jan 06 Posts: 6 Credit: 198,437 RAC: 0 |
Hi Manuel and P.P.L. The large memory requirements are a once-in-a-while occurrence, but not something entirely unexpected. These jobs occasionally find a very interesting possible solution and spend a lot of resources testing it. I had submitted these jobs with the requirement for 512 MB RAM allocated for boinc. But based on your observations, I'll increase that requirement to 1 GB in the future. Thanks very much for the reports! |
Craig Dickinson Send message Joined: 7 May 07 Posts: 8 Credit: 1,021,887 RAC: 0 |
Anyone else seeing the following consistent error:- File - minirosetta_graphics_1.92_windows_x86_64.exe stops downloading at 4.57/5.10 MB Message section is showing this as a HTTP error followed by Internet access OK - project servers may temporarily be down. I have reset the project (more than once) also detached and waited until next PC boot to re-attach. All this had no impact and its been doing this for several days now. So I am unable to process any work units as the applications hasn't finished downloading. Running on Boinc 6.10.18 for Windows 64Bit on Windows 7, AMD 64Bit Dual Core, 4GB RAM I am also running Seti@Home and this is running error free in both the standard and astropulse projects. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi jcorn. I have an “igfhum ProteinInterfaceDesign” that takes from 280 MB up to 1,2 GB of memory!! Is that normal? =============================================================================== Going by this, if i can make a suggestion you might want to up the memory limit to 1.5GB for those tasks. My rig that had the error has 1GB total, less with O.S. taken out that's not going to be enough. |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
lr15clusfa_opt_.1hz6.1hz6.IGNORE_THE_REST.c.0.34.pdb.pdb.JOB_17586_4 Exit status 193 (0xc1) SIGBUS: bus error |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Anyone else seeing the following consistent error:- It should recover the transfer from where it left off and get the rest of the file. But it seems it must have a hiccup along the way. Are you using a cacheing proxy server or something? Sounds like you've enabled the http tracing. Which Rosetta server does it say it is trying to get the file from? It should actually cycle through all of them as it does the retries. This should confuse a proxy enough that it would start fresh. You could always download it with your browser and drop it in the rosetta project directory. Here is one of the direct URLs: http://srv4.bakerlab.org/download/minirosetta_graphics_1.92_windows_x86_64.exe Rosetta Moderator: Mod.Sense |
Max DesGeorges Send message Joined: 1 Oct 05 Posts: 35 Credit: 942,527 RAC: 0 |
Hi Manuel and P.P.L. This is a good idea, but I think the specific WU I mentioned had another problem. It continued to take memory until the maximum available was reached. So maybe it tooke more RAM if I would have more in my PC. So far I'm the only one that notice this problem, maybe it is only one case. |
Message boards :
Number crunching :
minirosetta 2.05
©2024 University of Washington
https://www.bakerlab.org