minirosetta 2.05

Message boards : Number crunching : minirosetta 2.05

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

AuthorMessage
Admin

Send message
Joined: 13 Apr 07
Posts: 42
Credit: 260,782
RAC: 0
Message 65211 - Posted: 5 Feb 2010, 0:41:07 UTC

Seems if i give it some time it finds the protean structure again it was quite strange. Also I wanted to give a headsup that im having a huge issue with the boinc_filtered_lookbuild_threading WU's. Most of the new ones i have received have stalled at about 5 percent and ive had to abort. Are we any closer to fixing this issue because it seems to be getting worse. Ill give you some info on my current one though: protein: t385, cpu time at last checkpoint 33:20, cpu time: 34:24, elapsed time 14:21:01.
ID: 65211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin

Send message
Joined: 13 Apr 07
Posts: 42
Credit: 260,782
RAC: 0
Message 65213 - Posted: 5 Feb 2010, 17:40:07 UTC

Access Violation Error - lr15clusfa_opt_.1bgf.1bgf.IGNORE_THE_REST.c.85.0.pdb.pdb.JOB_17562_3_0

Link: https://boinc.bakerlab.org/rosetta/result.php?resultid=315684378

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x006D2D46 read attempt to address 0x00000000

Debug info in link as usual - Wingman also had same error
ID: 65213 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 65219 - Posted: 6 Feb 2010, 20:52:26 UTC

This one errored on Ubuntu x64 after 10sec.

lr15clusfa_opt_.1hz6.1hz6.IGNORE_THE_REST.c.3.21.pdb.pdb.JOB_17586_2_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=288116061

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>

Watchdog active.
SIGSEGV: segmentation violation
Stack trace (8 frames):
[0x96c49b3]
[0x96ee888]
[0xffffe500]
[0x80a8721]
[0x808fcc1]
[0x804985f]
[0x974c15c]
[0x8048121]

Exiting...

</stderr_txt>


ID: 65219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Max DesGeorges

Send message
Joined: 1 Oct 05
Posts: 35
Credit: 942,527
RAC: 0
Message 65221 - Posted: 7 Feb 2010, 6:28:32 UTC

I have an “igfhum ProteinInterfaceDesign” that takes from 280 MB up to 1,2 GB of memory!! Is that normal?
If it is, I think that in future most of the people will not have enough memory to run Rosetta anymore…

ID: 65221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 65222 - Posted: 7 Feb 2010, 6:41:29 UTC

Another error after 10sec.

lr15clusfa_opt_.1bgf.1bgf.IGNORE_THE_REST.c.12.1.pdb.pdb.JOB_17562_9_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=288315508

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>

Watchdog active.
SIGSEGV: segmentation violation
Stack trace (8 frames):
[0x96c49b3]
[0x96ee888]
[0xffffe500]
[0x80a8721]
[0x808fcc1]
[0x804985f]
[0x974c15c]
[0x8048121]

Exiting...
</stderr_txt>

ID: 65222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Max DesGeorges

Send message
Joined: 1 Oct 05
Posts: 35
Credit: 942,527
RAC: 0
Message 65223 - Posted: 7 Feb 2010, 9:16:09 UTC - in response to Message 65221.  

I have an “igfhum ProteinInterfaceDesign” that takes from 280 MB up to 1,2 GB of memory!! Is that normal?
If it is, I think that in future most of the people will not have enough memory to run Rosetta anymore…

The name of the WU is:
igfhum_looprefine_placestub2_2dsrI_1B6E_ProteinInterfaceDesign_2Feb2010¬_17660_331_0
After 45 minutes I restarted BOINC and the WU restarted from zero. Now, after 2 hours, the properties show me that the CPU time after checkpoint is still without any number (“---“), like the WU has worked for a few minutes.
Looking at the task manager it seems that the WU asks continuosly more memory, until it reaches the limit set in the preferences. Then it decreases rapidly to 280 MB and again increases up to around 1,2 GB.

Vista 32 bit, Core Duo T7250, 2 GB DDR2, BOINC 6.10.29

ID: 65223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Max DesGeorges

Send message
Joined: 1 Oct 05
Posts: 35
Credit: 942,527
RAC: 0
Message 65225 - Posted: 7 Feb 2010, 12:12:14 UTC - in response to Message 65223.  

I have an “igfhum ProteinInterfaceDesign” that takes from 280 MB up to 1,2 GB of memory!! Is that normal?
If it is, I think that in future most of the people will not have enough memory to run Rosetta anymore…

The name of the WU is:
igfhum_looprefine_placestub2_2dsrI_1B6E_ProteinInterfaceDesign_2Feb2010¬_17660_331_0
After 45 minutes I restarted BOINC and the WU restarted from zero. Now, after 2 hours, the properties show me that the CPU time after checkpoint is still without any number (“---“), like the WU has worked for a few minutes.
Looking at the task manager it seems that the WU asks continuosly more memory, until it reaches the limit set in the preferences. Then it decreases rapidly to 280 MB and again increases up to around 1,2 GB.

Vista 32 bit, Core Duo T7250, 2 GB DDR2, BOINC 6.10.29


UPDATE: The WU finished without errors.
Looking at the graphic I noticed that when the WU freeze in the “request memory loop”, it was always in the “kic_refine_r2” stage and the accepted energy didn’t vary.
I hope this info are useful. :)

ID: 65225 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
johndad5

Send message
Joined: 12 Aug 09
Posts: 7
Credit: 2,729,604
RAC: 0
Message 65226 - Posted: 7 Feb 2010, 13:14:59 UTC - in response to Message 64951.  

This app update includes a fix for checkpointing.

Please report issues and bugs here!

thanks,

DK

For some reason I am not getting new work. When I update the project it simply says "Not reporting or requesting tasks". I am using BOINC version 6.10.18 .
ID: 65226 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 65227 - Posted: 7 Feb 2010, 16:38:19 UTC - in response to Message 65226.  

For some reason I am not getting new work. When I update the project it simply says "Not reporting or requesting tasks". I am using BOINC version 6.10.18 .

John, it sounds like BOINC has decided to schedule work from other projects for the nearterm on your machine. It is trying to run within the resource shares between projects that you have established. It's normal, and once some work for the other projects has been done, it will come back and ask work from Rosetta automatically.
Rosetta Moderator: Mod.Sense
ID: 65227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Neo2

Send message
Joined: 3 Feb 10
Posts: 2
Credit: 811,111
RAC: 117
Message 65229 - Posted: 8 Feb 2010, 8:52:51 UTC

Hi!
I don't know if this happened also with older versions of rosetta since I started computing on the 3rd of February.
I'm running on an amd64 linux system, a pretty powerful one. Looking at my tasks log, I had about a 120 WUs assigned until today, but only 3-4 of them completed successfully. Others show "Outcome - Client error" / "Client state - Compute error". Looking at boinc.log gave me no information because it doesn't contain any error line except "output file .... absent", which I'm told from the FAQ it is safe to ignore. I'm running lhc, seti, milkyway, einstein, ralph, cosmology and with the exception of einstein tasks which seem to end up in computation errors also, every other program is running fine. Milkyway in particular granted me 2500 credits in the last four days (from which I assume the machine is stable). I have never observed problems with the machine itself (occasional lockups, strange sudden shutdowns etc).

This is the /proc/cpuinfo file (I've omitted the other 3 cores):
# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 4
model name : AMD Phenom(tm) II X4 920 Processor
stepping : 2
cpu MHz : 2800.000
cache size : 512 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt
bogomips : 5619.47
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

The machine is equipped with 8Gb of RAM. Everything is running at stock speed, I'm not overclocking. If any other information is needed I can provide it and I'm not scared to do some debugging. :)

Thanks
Neo2
ID: 65229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 65231 - Posted: 8 Feb 2010, 17:23:26 UTC

Neo2, thanks for joining Rosetta. I see you have two machines. The 4 core that you described is here. And at present, it doesn't show any successfully completed work units. If you look at the task details for that host, such as this one], they each report an error opening a file. The file name seems to vary with each task.

This implies a security setup problem on your machine. The executable and the user that is running the BOINC core client, need authority to the files that are downloaded. Is it possible your BOINC installation is conflicting with some anti-virus software? Or other security measures?
Rosetta Moderator: Mod.Sense
ID: 65231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Neo2

Send message
Joined: 3 Feb 10
Posts: 2
Credit: 811,111
RAC: 117
Message 65232 - Posted: 8 Feb 2010, 18:52:55 UTC - in response to Message 65231.  
Last modified: 8 Feb 2010, 18:57:14 UTC

I don't think so, currently I have clamd up and running, but is only a daemon to fulfill requests from userspace programs, not a real-time antivirus software.
I'm running 2.6.33 git kernel, without any extra security measures: no grsecurity, no firewall, no external security hooks of any sort, no SElinux.
The directory in which BOINC runs is owned by user and group boinc, both existing, no file in the directory is owned by other users. Every file (except executables which have 0755) has got permission 0644 while the directories have 0755. The BOINC executable runs with boinc:boinc also.
Before starting BOINC for the first time I tuned the directory parameters, so every file in the BOINC directory has been created by BOINC itself.
Gentoo by default installs a stock /etc/conf.d file through which the BOINC service is started. I only modified the paths for data storage and logging, nothing else.

The file is the following:
# Config file for /etc/init.d/boinc

# Owner of BOINC process (must be existing)
USER="boinc"
GROUP="boinc"

# Directory with runtime data: Work units, project binaries, user info etc.
RUNTIMEDIR="/mnt/storage/boinc"

# Location of the boinc command line binary
BOINCBIN="/usr/bin/boinc_client"

# Logfile (/dev/null for nowhere)
LOGFILE="/mnt/storage/boinc/boinc.log"

# Allow remote gui RPC yes or no
ALLOW_REMOTE_RPC="yes"

# nice level
NICELEVEL="17"

# scheduling parameters, arguments to chrt(1)
SCHED_PARAM="--batch 0"

# Relative CPU allocation for boinc user, default is 1024,
# requires CONFIG_FAIR_GROUP_SCHED and CONFIG_USER_SCHED,
# see /usr/src/linux/Documentation/scheduler/sched-design-CFS.txt
CPU_SHARE="768"

Now I'm a bit disappointed.
Would the manual removal of the rosetta files and the re-sync with the project be of any use?
ID: 65232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 65236 - Posted: 8 Feb 2010, 21:21:01 UTC

This errored after 47min.

igfhum_looprefine_placestub2_2dsrI_1P6F_ProteinInterfaceDesign_2Feb2010_17660_271_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=288505225

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
Maximum memory exceeded
</message>

Mon 08 Feb 2010 22:14:56 EST|rosetta@home|Aborting task igfhum_looprefine_placestub2_2dsrI_1P6F_ProteinInterfaceDesign_2Feb2010_17660_271_0: exceeded memory limit 918.79MB > 909.78MB

Mon 08 Feb 2010 22:14:59 EST|rosetta@home|Output file igfhum_looprefine_placestub2_2dsrI_1P6F_ProteinInterfaceDesign_2Feb2010_17660_271_0_0 for task absent


ID: 65236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 65237 - Posted: 8 Feb 2010, 21:25:39 UTC - in response to Message 65232.  

Would the manual removal of the rosetta files and the re-sync with the project be of any use?


...doubtful. I would have suggested that if I felt it stood a good chance of helping your situation. But it can't hurt anything (costs you some bandwidth to reload everything).

Now that I think about it, if security setup were the problem, you should have same issue with other projects.

Anyone else have any ideas why Linux would be unable to open an application file?
Rosetta Moderator: Mod.Sense
ID: 65237 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jcorn

Send message
Joined: 27 Jan 06
Posts: 6
Credit: 198,437
RAC: 0
Message 65238 - Posted: 8 Feb 2010, 21:30:19 UTC

Hi Manuel and P.P.L.

The large memory requirements are a once-in-a-while occurrence, but not something entirely unexpected. These jobs occasionally find a very interesting possible solution and spend a lot of resources testing it. I had submitted these jobs with the requirement for 512 MB RAM allocated for boinc. But based on your observations, I'll increase that requirement to 1 GB in the future. Thanks very much for the reports!
ID: 65238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Craig Dickinson

Send message
Joined: 7 May 07
Posts: 8
Credit: 1,021,887
RAC: 0
Message 65239 - Posted: 8 Feb 2010, 22:23:47 UTC

Anyone else seeing the following consistent error:-

File - minirosetta_graphics_1.92_windows_x86_64.exe stops downloading at 4.57/5.10 MB

Message section is showing this as a HTTP error followed by Internet access OK - project servers may temporarily be down.

I have reset the project (more than once) also detached and waited until next PC boot to re-attach. All this had no impact and its been doing this for several days now. So I am unable to process any work units as the applications hasn't finished downloading.

Running on Boinc 6.10.18 for Windows 64Bit on Windows 7, AMD 64Bit Dual Core, 4GB RAM

I am also running Seti@Home and this is running error free in both the standard and astropulse projects.
ID: 65239 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 65240 - Posted: 8 Feb 2010, 22:24:56 UTC - in response to Message 65221.  

Hi jcorn.

I have an “igfhum ProteinInterfaceDesign” that takes from 280 MB up to 1,2 GB of memory!! Is that normal?
If it is, I think that in future most of the people will not have enough memory to run Rosetta anymore…

===============================================================================
Going by this, if i can make a suggestion you might want to up the memory limit to 1.5GB for those tasks.

My rig that had the error has 1GB total, less with O.S. taken out that's not going to be enough.




ID: 65240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 65242 - Posted: 8 Feb 2010, 23:09:16 UTC

ID: 65242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 65244 - Posted: 9 Feb 2010, 4:27:01 UTC - in response to Message 65239.  

Anyone else seeing the following consistent error:-

File - minirosetta_graphics_1.92_windows_x86_64.exe stops downloading at 4.57/5.10 MB

Message section is showing this as a HTTP error followed by Internet access OK - project servers may temporarily be down.

I have reset the project (more than once) also detached and waited until next PC boot to re-attach. All this had no impact and its been doing this for several days now. So I am unable to process any work units as the applications hasn't finished downloading.

Running on Boinc 6.10.18 for Windows 64Bit on Windows 7, AMD 64Bit Dual Core, 4GB RAM

I am also running Seti@Home and this is running error free in both the standard and astropulse projects.


It should recover the transfer from where it left off and get the rest of the file. But it seems it must have a hiccup along the way. Are you using a cacheing proxy server or something?

Sounds like you've enabled the http tracing. Which Rosetta server does it say it is trying to get the file from? It should actually cycle through all of them as it does the retries. This should confuse a proxy enough that it would start fresh.

You could always download it with your browser and drop it in the rosetta project directory. Here is one of the direct URLs:
http://srv4.bakerlab.org/download/minirosetta_graphics_1.92_windows_x86_64.exe
Rosetta Moderator: Mod.Sense
ID: 65244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Max DesGeorges

Send message
Joined: 1 Oct 05
Posts: 35
Credit: 942,527
RAC: 0
Message 65248 - Posted: 9 Feb 2010, 13:54:46 UTC - in response to Message 65238.  

Hi Manuel and P.P.L.

The large memory requirements are a once-in-a-while occurrence, but not something entirely unexpected. These jobs occasionally find a very interesting possible solution and spend a lot of resources testing it. I had submitted these jobs with the requirement for 512 MB RAM allocated for boinc. But based on your observations, I'll increase that requirement to 1 GB in the future. Thanks very much for the reports!

This is a good idea, but I think the specific WU I mentioned had another problem. It continued to take memory until the maximum available was reached. So maybe it tooke more RAM if I would have more in my PC.
So far I'm the only one that notice this problem, maybe it is only one case.

ID: 65248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

Message boards : Number crunching : minirosetta 2.05



©2024 University of Washington
https://www.bakerlab.org