Message boards : Number crunching : Client error
Author | Message |
---|---|
anton_P6 Send message Joined: 19 Oct 10 Posts: 5 Credit: 114,624 RAC: 0 |
I have been running rosetta@home on this pc for a month now, the first 3 week went fine, but in the 4th week, the computer suddenly only returned client errors. I'll post an example of an error message: <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Maximum memory exceeded </message> <stderr_txt> [2010-11-22 9:56:13:] :: BOINC:: Initializing ... ok. [2010-11-22 9:56:13:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/1NKU_pcs_cst_files.r2.v1.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 6.5.0 Dump Timestamp : 11/22/10 17:20:55 Install Directory : C:Program FilesBOINC Data Directory : C:Documents and SettingsAll UsersApplication DataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore Loaded Library : C:Program FilesBOINC\dbghelp.dll Loaded Library : C:Program FilesBOINC\symsrv.dll Loaded Library : C:Program FilesBOINC\srcsrv.dll LoadLibraryA( C:Program FilesBOINC\version.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:Documents and SettingsAll UsersApplication DataBOINCslots ;C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore;srv*C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://boinc.berkeley.edu/symstore ModLoad: 00400000 00d20000 C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_2.17_windows_intelx86.exe (-nosymbols- Symbols Loaded) Linked PDB Filename : D:boinc_buildminirosetta_2.17miniVisual StudioBoincReleaseminirosetta_2.17_windows_intelx86.pdb ModLoad: 7c900000 000b8000 C:WINDOWSsystem32ntdll.dll (5.1.2600.5755) (PDB Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 5.1.2600.5755 (xpsp_sp3_gdr.090206-1234) Company Name : Microsoft Corporation Product Name : Besturingssysteem Microsoft� Windows� Product Version : 5.1.2600.5755 ModLoad: 7c7d0000 00100000 C:WINDOWSsystem32kernel32.dll (5.1.2600.5781) (PDB Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 5.1.2600.5781 (xpsp_sp3_gdr.090321-1317) Company Name : Microsoft Corporation Product Name : Besturingssysteem Microsoft� Windows� Product Version : 5.1.2600.5781 ModLoad: 7e390000 00091000 C:WINDOWSsystem32USER32.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : Besturingssysteem Microsoft� Windows� Product Version : 5.1.2600.5512 ModLoad: 77e40000 00049000 C:WINDOWSsystem32GDI32.dll (5.1.2600.5698) (PDB Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 5.1.2600.5698 (xpsp_sp3_gdr.081022-1932) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 5.1.2600.5698 ModLoad: 77f40000 000ab000 C:WINDOWSsystem32ADVAPI32.dll (5.1.2600.5755) (PDB Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 5.1.2600.5755 (xpsp_sp3_gdr.090206-1234) Company Name : Microsoft Corporation Product Name : Besturingssysteem Microsoft� Windows� Product Version : 5.1.2600.5755 ModLoad: 77da0000 00093000 C:WINDOWSsystem32RPCRT4.dll (5.1.2600.6022) (PDB Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 5.1.2600.6022 (xpsp_sp3_gdr.100813-1643) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 5.1.2600.6022 ModLoad: 77f10000 00011000 C:WINDOWSsystem32Secur32.dll (5.1.2600.5834) (PDB Symbols Loaded) Linked PDB Filename : secur32.pdb File Version : 5.1.2600.5834 (xpsp_sp3_gdr.090624-1305) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 5.1.2600.5834 ModLoad: 76330000 0001d000 C:WINDOWSsystem32IMM32.DLL (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 5.1.2600.5512 ModLoad: 77650000 00021000 C:WINDOWSsystem32NTMARTA.DLL (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : Besturingssysteem Microsoft� Windows� Product Version : 5.1.2600.5512 ModLoad: 77be0000 00058000 C:WINDOWSsystem32msvcrt.dll (7.0.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.2600.5512 (xpsp.080413-2111) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 7.0.2600.5512 ModLoad: 774a0000 0013e000 C:WINDOWSsystem32ole32.dll (5.1.2600.6010) (-exported- Symbols Loaded) Linked PDB Filename : ole32.pdb File Version : 5.1.2600.6010 (xpsp_sp3_gdr.100712-1633) Company Name : Microsoft Corporation Product Name : Besturingssysteem Microsoft� Windows� Product Version : 5.1.2600.6010 ModLoad: 71b80000 00013000 C:WINDOWSsystem32SAMLIB.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : samlib.pdb File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 5.1.2600.5512 ModLoad: 76f20000 0002d000 C:WINDOWSsystem32WLDAP32.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : wldap32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : Besturingssysteem Microsoft� Windows� Product Version : 5.1.2600.5512 ModLoad: 19ce0000 00115000 C:Program FilesBOINCdbghelp.dll (6.8.4.0) (PDB Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 6.8.0004.0 (debuggers(dbg).070515-1751) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.8.0004.0 ModLoad: 19e00000 00048000 C:Program FilesBOINCsymsrv.dll (6.8.4.0) (PDB Symbols Loaded) Linked PDB Filename : symsrv.pdb File Version : 6.8.0004.0 (debuggers(dbg).070515-1751) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.8.0004.0 ModLoad: 19e50000 0003b000 C:Program FilesBOINCsrcsrv.dll (6.8.4.0) (PDB Symbols Loaded) Linked PDB Filename : srcsrv.pdb File Version : 6.8.0004.0 (debuggers(dbg).070515-1751) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.8.0004.0 ModLoad: 77bd0000 00008000 C:WINDOWSsystem32version.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : version.pdb File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 5.1.2600.5512 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 9425, Write: 0, Other 3956 - I/O Transfers Counters - Read: 0, Write: 50613, Other 0 - Paged Pool Usage - QuotaPagedPoolUsage: 65124, QuotaPeakPagedPoolUsage: 65124 QuotaNonPagedPoolUsage: 3832, QuotaPeakNonPagedPoolUsage: 3832 - Virtual Memory Usage - VirtualSize: 459423744, PeakVirtualSize: 459423744 - Pagefile Usage - PagefileUsage: 283504640, PeakPagefileUsage: 283504640 - Working Set Size - WorkingSetSize: 260259840, PeakWorkingSetSize: 260259840, PageFaultCount: 13630254 *** Dump of thread ID 2988 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 2343750.000000, User Time: 5312500.000000, Wait Time: 1927386.000000 - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E - Registers - eax=00000000 ebx=00000000 ecx=00e0f9ca edx=0357613c esi=00000001 edi=00000000 eip=7c90120e esp=0357fba0 ebp=0357ffec cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 - Callstack - ChildEBP RetAddr Args to Child 0357fb9c 0040784d 7c7d2446 6f62613c 3e2f7472 3e2f7400 ntdll!_DbgBreakPoint@0+0x0 FPO: [0,0,0] 0357ffec 00000000 00408710 00000000 00000000 000000c8 minirosetta_2.17_windows_intelx!+0x0 *** Dump of thread ID 1548 (state: Ready): *** - Information - Status: Base Priority: Above Normal, Priority: Above Normal, , Kernel Time: 1326562560.000000, User Time: 213556412416.000000, Wait Time: 1927387.000000 - Registers - eax=000b3144 ebx=12822478 ecx=0d8d0020 edx=00000001 esi=1248e008 edi=12822cc8 eip=00b1bcbf esp=01aad630 ebp=01aad66c cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000216 - Callstack - ChildEBP RetAddr Args to Child 01aad66c 00aed2f4 12822470 12822cc0 120fc988 01aad6b0 minirosetta_2.17_windows_intelx!+0x0 01aad6f8 0058f78f 0ab191d8 0ab20c98 12ebf358 12ebf4d0 minirosetta_2.17_windows_intelx!+0x0 01aad730 00696e45 13250768 0ab191d8 0ab20c98 12ebf358 minirosetta_2.17_windows_intelx!+0x0 01aad7ac 00697cf1 01aaeb34 01aadc2c 120fc980 01aad894 minirosetta_2.17_windows_intelx!+0x0 01aad830 0065ad10 01aaeb34 01aadc2c 120fc980 01aad9ec minirosetta_2.17_windows_intelx!+0x0 01aadcb8 00a30df1 00e55c58 00000000 000007d0 00000000 minirosetta_2.17_windows_intelx!+0x0 01aadd2c 0082c9d0 01aaeb34 0aee37a0 120fc980 132c90b8 minirosetta_2.17_windows_intelx!+0x0 01aadd54 007b392c 00e55b54 00000000 00000000 01aadd98 minirosetta_2.17_windows_intelx!+0x0 00000000 00000000 00000000 00000000 00000000 00000000 minirosetta_2.17_windows_intelx!+0x0 *** Dump of thread ID 3796 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 156250.000000, User Time: 156250.000000, Wait Time: 1927287.000000 - Registers - eax=11cdfe28 ebx=085c0201 ecx=11cde734 edx=000033db esi=00000000 edi=11cdfdf8 eip=7c90e514 esp=11cdfdc8 ebp=11cdfe20 cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 - Callstack - ChildEBP RetAddr Args to Child 11cdfdc4 7c90d21a 7c7d23f1 00000000 11cdfdf8 0000004e ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0] 11cdfdc8 7c7d23f1 00000000 11cdfdf8 0000004e 00002a30 ntdll!_NtDelayExecution@8+0x0 FPO: [2,0,0] 11cdfe20 7c7d2455 000007d0 00000000 7c7d2446 00724081 kernel32!_SleepEx@8+0x0 11cdfe30 00724081 000007d0 d15a4ed6 ffffffff 085c02d0 kernel32!_Sleep@4+0x0 11cdfe38 d15a4ed6 ffffffff 085c02d0 11cdff6c 085c02d0 minirosetta_2.17_windows_intelx!+0x0 11cdfe3c ffffffff 085c02d0 11cdff6c 085c02d0 00000001 minirosetta_2.17_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'd15a4ed6' 11cdff3c 7c917e09 7c917ec0 7c7d0000 11cdff7c 00000000 minirosetta_2.17_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ffffffff' 11cdffe0 7c7db72f 00000000 00000000 00000000 00414d32 ntdll!_LdrpGetProcedureAddress@20+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '7c917e09' 11cdffe4 00000000 00000000 00000000 00414d32 085c02d0 kernel32!_BaseThreadStart@8+0x0 FPO: [0,0,0] SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '7c7db72f' *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> Do you guys have any idea what could be wrong? The only sensible thing I could come up with is that the computer doesn't have enough memory (768 MB RAM). |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
<message> This should be an indication that the task has exceeded a configured "reasonable" maximum memory limit... nothing specific to your machine. When tasks are created, reasonable limits for memory, disk space of output and CPU time are established. BOINC monitors these as tasks run to ensure they are behaving normally. Have you looked at them in the task manager as they run to see how much memory they are using? Link to anton's host with many such work units Links to a WU: 1NKU_R2_LESSPCSCST2_BOINC_abrelax.default.v1_SAVE_ALL_OUT_22545_22164_0 That one was then reissued to another user which also was unable to process it. Most of your others were reissued and ran OK though on the other machine. So, a bit of a mixed bag there. Of the ones I picked at random where another machine ran the same task successfully, the other machine always had at least 1GB per core. Could just be coincidence. In short, I don't think it is your machine. These tasks seem to be having some problems running well. But you've had 17 in a row now. Another machine failed with "<message> too many exit(0)s </message>". I myself have noticed a few that issue the BOINC msg about no output file being present and then start again. I'm not sure what to make of it. I had been thinking this might be what has been behind the "double headers" I've seen, but the tasks eventually completed and did not have double headers. Rosetta Moderator: Mod.Sense |
anton_P6 Send message Joined: 19 Oct 10 Posts: 5 Credit: 114,624 RAC: 0 |
ok, thanks for the quick reply! But don't you think it's weird my other 2 hosts are just running fine and only this one is having client errors? edit: Actually I want to say: Is there anything I can do about these client errors, if not I'm just puzzled why this explicit host receives 17 client errors in a row.. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
edit: Actually I want to say: Is there anything I can do about these client errors, if not I'm just puzzled why this explicit host receives 17 client errors in a row.. How much memory are you allowing BOINC to use? If BOINC is trying to use more than your maximum memory then imposing a lower limit will encourage BOINC to download tasks with lower requirements. It may or may not fix the problem but I would suggest trying a lower limit for a few days and seeing what happens. Edit: The reason why only one host is having the problem is because your other two computers have 2GB of RAM so can cope with almost any task Rosetta sends at them. |
anton_P6 Send message Joined: 19 Oct 10 Posts: 5 Credit: 114,624 RAC: 0 |
ok, i'll try that :) Right now the host can use max 33% of the memory, so that would be 250 MB, so you're suggesting I'll have to lower that number to, let's say, 20%? (Just to be clear..) |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
ok, i'll try that :) I was working on the assumption that you had the memory settings on a high percentage, so lowering it may have helped. However as you are already down to 33%/250 MB then lowering it further will probably have the effect transient describes. If you do that, I don't think you will get any work. So that should solve the problem of the errors. ;) 750 MB for the entire computer seems a bit on the small side for a modern computer, to me. Upping the percentage would be more useful I think. Maybe a check on the RAM for that computer would also be a good idea. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Ah! So perhaps "maximum memory" can also mean that the task has exceeded what BOINC is configured to use. That would make more sense in this case. So the task was running along and grew to need more then 250MB of memory and BOINC figured out that it's never going to be able to complete. At least if you have the same % for both "in use" and "idle" memory. Otherwise, it would go to a status of "waiting for memory" and basically wait until the machine is idle to try to finish out the task. ...or perhaps it did that as well, and the task grew to exceed the maximum memory setting for when the machine is idle as well. If you haven't already tried it, you might set it to something more like 50% or 65% when in-use and 80% when idle. This should let the tasks run (even if only one at a time when the memory usage of the tasks are both growing large at the same time), and yet leave you some room for your other work to remain available to help avoid much initial sluggishness when you first sit down to use the machine. A single task can grow beyond 250MB. If the above slows your machine enough that you notice, then the previous suggestion to limit BOINC to just one CPU should help. Then you could reduce the above by 10-15% and still get some good work done both for R@h and for yourself. Short of all of that, this particular machine doesn't have much memory available and so if running R@h interferes with your use of the machine, there are other BOINC projects which have lower typical memory usage that you might consider running instead. Rosetta Moderator: Mod.Sense |
anton_P6 Send message Joined: 19 Oct 10 Posts: 5 Credit: 114,624 RAC: 0 |
Thanks for all the replies:) I think we have figured out what the problem is: The computer hasn't enough memory to complete the task (both in use and idle memory usage is set to 33%), so he returns a client error. I'll set the memory usage to 50% and see what happens:). However, the host has used 50% memory before and the machine became sluggish, so I'll have to look if it becomes sluggish again. As the computer is already a single-core machine, the solution of using only one core isn't really a solution. So, if I have to switch to another project, I would like a project that has the same region of research. Do you guys have any suggestions, what about the Human Genome Project? |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
So, if I have to switch to another project, I would like a project that has the same region of research. Do you guys have any suggestions, what about the Human Genome Project? The closest project to Rosetta would be the Human Proteome Folding project at World Community Grid as it makes use of the Rosetta software and the scientists share information with the Rosetta team. World Community Grid has a number of medical and other projects that you can opt in or out of at your discretion. Some other similar projects on BOINC are:
|
anton_P6 Send message Joined: 19 Oct 10 Posts: 5 Credit: 114,624 RAC: 0 |
I changed the maximum amount of memory the computer could use to 50% and watched the task in task manager. The task used +/- 270 MB (slightly more than 33%) and finished correctly. So I think we were right and the host returned client errors, because he hadn't enough memory. Because the task only uses +/- 270 MB, the computer doesn't become sluggish, so I decided to keep running Rosetta@home on this host:). Thanks for your help! |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Because the task only uses +/- 270 MB, the computer doesn't become sluggish... Different Rosetta tasks require different amounts of memory based on the size of the protein you are crunching. Hopefully most of your tasks will remain below 270MB, but don't be surprised if you get some that want even more memory. |
Message boards :
Number crunching :
Client error
©2024 University of Washington
https://www.bakerlab.org