Message boards : Number crunching : Computation errors
Author | Message |
---|---|
David703 Send message Joined: 17 Jul 17 Posts: 5 Credit: 64,485 RAC: 0 |
Hi, since I've come back to this project I've been seeing some strange errors in some of my WUs, especially in the ones that study big proteins, here are a few examples: -https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314770 -https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314768 -https://boinc.bakerlab.org/rosetta/result.php?resultid=1065460662 How can I keep these errors from happening? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,027,336 RAC: 7,074 |
Hi, since I've come back to this project I've been seeing some strange errors in some of my WUs, especially in the ones that study big proteins, here are a few examples: Rosetta developers were quite sloppy in their allocation and use of memory. Task 1065460662 ran out of memory. https://boinc.bakerlab.org/rosetta/result.php?resultid=1065460662 The other two error out with "Funzione non corretta" or "incorrect function" When one WU runs out of memory, other WU may get strange error messages from function calls as developers don't always check the return results of all system calls. The WU you are running are 64-bit and sometimes take large amounts of memory ... frequently over a GB each. 8gb should be enough to run 4 Rosetta 64-bit WU, so I would examine how memory is being used and change the workload. Buy more memory if practical. Lower the number of Rosetta WU running simultaneously with app_config.xml or BOINC -> OPTIONS -> COMPUTING PREFERENCES -> USAGE LIMITS |
David703 Send message Joined: 17 Jul 17 Posts: 5 Credit: 64,485 RAC: 0 |
Ok, thank you! |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Seems unlikely they've ever addressed this problem, eh? I see them pretty often. Especially annoying when they have run up 8 hours of effort before crashing, presumably with no points earned. And no, at this point I don't care enough to do the searching to try to figure out if the points were granted. I don't even care enough to read the rest of the thread beyond the Subject: and glancing at a couple of the posts. Latest example: Application Rosetta Mini 3.78 Name start_close_HHH_rd4_0056.min_rise1.83_whole_pass_aagb.bp_20190406150644_0001_0001_0001_0003_0001_0001_fragments_fold_SAVE_ALL_OUT_833066_1053 State Computation error Received 2019年07月22日 08時13分16秒 Report deadline 2019年07月30日 08時13分11秒 Estimated computation size 80,000 GFLOPs CPU time 07:49:11 Elapsed time 07:59:03 Executable minirosetta_3.78_x86_64-pc-linux-gnu #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,598,427 RAC: 8,864 |
Latest example: Rosetta Mini 3.78 was release in October 2017. Since then, a lot of errors and problems. No debug, no new version. Nothing |
mmonnin Send message Joined: 2 Jun 16 Posts: 58 Credit: 23,723,672 RAC: 47,662 |
I'd rather have the Rosetta mini tasks vs the Rosetta version that runs for 5h then has an error when the set run time is 1hr. |
blyons123 Send message Joined: 8 Apr 14 Posts: 4 Credit: 118,348 RAC: 0 |
Happened again after resetting project!? 9/12/2019 10:20:17 PM | Rosetta@home | Task bc96_EHEE_hb1_2413_fold_SAVE_ALL_OUT_857815_1040_0 exited with zero status but no 'finished' file 9/12/2019 10:20:17 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. 9/12/2019 10:22:08 PM | Rosetta@home | Task Longxing_ems_ferrM_2260.11745_fold_SAVE_ALL_OUT_863531_13_0 exited with zero status but no 'finished' file 9/12/2019 10:22:08 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. 9/12/2019 10:23:49 PM | Rosetta@home | Task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 exited with zero status but no 'finished' file 9/12/2019 10:23:49 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. 9/12/2019 10:24:33 PM | Rosetta@home | Task bc96_4h_hb1_1620_fold_SAVE_ALL_OUT_857813_1040_0 exited with zero status but no 'finished' file 9/12/2019 10:24:33 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. 9/12/2019 10:25:54 PM | Rosetta@home | Task bc96_EHEE_hb1_2413_fold_SAVE_ALL_OUT_857815_1040_0 exited with zero status but no 'finished' file 9/12/2019 10:25:54 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. 9/12/2019 10:30:08 PM | Rosetta@home | Task bc96_EHEE_hb1_2413_fold_SAVE_ALL_OUT_857815_1040_0 exited with zero status but no 'finished' file 9/12/2019 10:30:08 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. 9/12/2019 10:34:11 PM | Rosetta@home | Task bc96_EHEE_hb1_2413_fold_SAVE_ALL_OUT_857815_1040_0 exited with zero status but no 'finished' file 9/12/2019 10:34:11 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. 9/12/2019 10:36:06 PM | Rosetta@home | work fetch suspended by user 9/12/2019 10:36:15 PM | Rosetta@home | Task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 exited with zero status but no 'finished' file 9/12/2019 10:36:15 PM | Rosetta@home | If this happens repeatedly you may need to reset the project. 9/12/2019 10:36:32 PM | Rosetta@home | task bc96_4h_hb1_1620_fold_SAVE_ALL_OUT_857813_1040_0 suspended by user 9/12/2019 10:36:35 PM | Rosetta@home | Starting task foldit_2007855_0007_fold_and_dock_SAVE_ALL_OUT_849408_1557_0 9/12/2019 10:36:35 PM | Rosetta@home | task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 suspended by user 9/12/2019 10:36:35 PM | Rosetta@home | task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 resumed by user 9/12/2019 10:36:37 PM | Rosetta@home | task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 suspended by user 9/12/2019 10:36:39 PM | Rosetta@home | Starting task Longxing_ems_ferrM_3025.11863_fold_SAVE_ALL_OUT_863659_13_0 9/12/2019 10:36:40 PM | Rosetta@home | task Longxing_ems_ferrM_2260.11745_fold_SAVE_ALL_OUT_863531_13_0 suspended by user |
blyons123 Send message Joined: 8 Apr 14 Posts: 4 Credit: 118,348 RAC: 0 |
every mini task gives me this error. 9/23/2019 6:22:43 PM | Rosetta@home | Task Longxing_ems_ferrM_5178.12181_fold_SAVE_ALL_OUT_863970_24_0 exited with zero status but no 'finished' file |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 62 |
I've also had two work units crash out today, one with this... Exit status 1 (0x00000001) Unknown error code ... the other with this... Exit status -529697949 (0xE06D7363) Unknown error code No new tasks set for now. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
rb_09_19_8636_8623_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_03_05_867741_37 failed with invalid chi angle on Windows File: C:cygwin64homeboincRosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) </stderr_txt> [/url] Task 1094602971 https://boinc.bakerlab.org/rosetta/result.php?resultid=109460297 Workunit 985919585 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=985919585 |
wolfman1360 Send message Joined: 18 Feb 17 Posts: 72 Credit: 18,450,036 RAC: 0 |
Some errored out tasks, 10 in total. That's a lot of computing time gone. I'm not sure which file is in use or if it was Rosetta or Boinc. This machine has been up solid. Suspending for now. https://boinc.bakerlab.org/rosetta/result.php?resultid=1095086805 https://boinc.bakerlab.org/rosetta/result.php?resultid=1095124487 https://boinc.bakerlab.org/rosetta/result.php?resultid=1095064719 |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 62 |
I have had four units crash out in recent days. One with "Aborted by Server" so I discount that one. The other three with "Out of Memory". I think this is because I was sent "Rosetta v4.07 windows_intelx86" to run the job, and not "Rosetta v4.07 windows_intelx86_64". Of wingmen on the failing jobs Others have crashed with the same error, except one, who completed the unit, but was running Rosetta v4.07 windows_intelx86_64. Obviously, a 64 bit system can access a much greater memory range than a 32 bit. The question that arises though, is why was I sent x86 and not x86_64? My system runs 64 bit Windows and has more memory installed and available to BOINC than the chap that completed the job without error. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 62 |
I've got another couple of weird ones now. One is 0.586% done but has 16:08:53 elapsed and 114d 02:32:50 remaining increasing quite rapidly, the other 0.259% after 06:15:56 elapsed and 12:42:49 with the last digit flipping 48 - 49 - 48 - 49. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
wolfman1360 Send message Joined: 18 Feb 17 Posts: 72 Credit: 18,450,036 RAC: 0 |
I have two errored out tasks over here. Both ran for a day before showing up as invalid, which is amazingly frustrating. Message appears to be finish file too long for both. I just attached more machines (granted, with more memory able to handle the occasional usage up to 1 gb per core). I just hope they don't receive the same errors. https://boinc.bakerlab.org/rosetta/result.php?resultid=1116966347 https://boinc.bakerlab.org/rosetta/result.php?resultid=1116965297 |
kaancanbaz Send message Joined: 28 Apr 06 Posts: 23 Credit: 3,045,052 RAC: 0 |
Maybe they are sending previously failed wus only. There is not much new work units available. I hope that they are about the release a new version and thats why they slowed down the wus. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1679 Credit: 17,803,499 RAC: 22,548 |
I got a few "Out of memory" errors when i first started Rosetta, so i doubled my system RAM. I've now got 32GB, 95% available to BOINC, using all 6c/12t on the CPU. Even so, i just had another WU end in error with an "Out of memory" Unhandled Exception error. Edit- So far, all computation errors have occurred with the Rosetta v4.07 windows_intelx86 application. <core_client_version>7.6.33</core_client_version> <![CDATA[ <message> (unknown error) - exit code -529697949 (0xe06d7363) </message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.07_windows_intelx86.exe -run:protocol jd2_scripting -parser:protocol jhr_boinc_v2.xml @flags -in:file:silent 1uq3vg3u_Junior_HalfRoid_design2_COVID-19.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 1uq3vg3u_Junior_HalfRoid_design2_COVID-19.zip @1uq3vg3u_Junior_HalfRoid_design2_COVID-19.flags -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2037194 Starting watchdog... Watchdog active. Unhandled Exception Detected... - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x76484192 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.9.0 Dump Timestamp : 03/31/20 17:11:54 Install Directory : Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:ProgramDataBOINCslots12;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore ModLoad: 0000000000200000 0000000003413000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.07_windows_intelx86.exe (-exported- Symbols Loaded) Linked PDB Filename : C:cygwinhomeboincRosetta_4.07mainsourceideVisualStudioBoincReleaserosetta_4.07_windows_intelx86.pdb ModLoad: 0000000077030000 000000000019a000 C:WINDOWSSYSTEM32ntdll.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : wntdll.pdb File Version : 10.0.18362.329 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.329 ModLoad: 0000000075b60000 00000000000e0000 C:WINDOWSSystem32KERNEL32.DLL (6.2.18362.329) (-exported- Symbols Loaded) Linked PDB Filename : wkernel32.pdb File Version : 10.0.18362.329 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.329 ModLoad: 0000000076370000 00000000001fe000 C:WINDOWSSystem32KERNELBASE.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : wkernelbase.pdb File Version : 10.0.18362.329 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.329 ModLoad: 0000000076880000 000000000005e000 C:WINDOWSSystem32WS2_32.dll (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : ws2_32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 0000000076a50000 00000000000bb000 C:WINDOWSSystem32RPCRT4.dll (6.2.18362.628) (-exported- Symbols Loaded) Linked PDB Filename : wrpcrt4.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 0000000074800000 0000000000020000 C:WINDOWSSystem32SspiCli.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : wsspicli.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000747f0000 000000000000a000 C:WINDOWSSystem32CRYPTBASE.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : cryptbase.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 0000000076b10000 000000000005f000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.18362.295) (-exported- Symbols Loaded) Linked PDB Filename : bcryptprimitives.pdb File Version : 10.0.18362.295 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.295 ModLoad: 0000000075ea0000 0000000000076000 C:WINDOWSSystem32sechost.dll (6.2.18362.693) (-exported- Symbols Loaded) Linked PDB Filename : sechost.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 0000000074eb0000 0000000000197000 C:WINDOWSSystem32USER32.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : wuser32.pdb File Version : 10.0.17134.343 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.17134.343 ModLoad: 0000000075180000 0000000000017000 C:WINDOWSSystem32win32u.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : wwin32u.pdb File Version : 10.0.18362.719 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.719 ModLoad: 0000000075310000 0000000000021000 C:WINDOWSSystem32GDI32.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : wgdi32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000751b0000 000000000015a000 C:WINDOWSSystem32gdi32full.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : wgdi32full.pdb File Version : 10.0.18362.719 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.719 ModLoad: 0000000075050000 000000000007c000 C:WINDOWSSystem32msvcp_win.dll (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : msvcp_win.pdb File Version : 10.0.18362.387 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.387 ModLoad: 0000000075340000 000000000011f000 C:WINDOWSSystem32ucrtbase.dll (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : ucrtbase.pdb File Version : 10.0.18362.387 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.387 ModLoad: 00000000767c0000 0000000000079000 C:WINDOWSSystem32ADVAPI32.dll (6.2.18362.329) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 0000000074820000 00000000000bf000 C:WINDOWSSystem32msvcrt.dll (7.0.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.18362.1 ModLoad: 0000000076c70000 0000000000025000 C:WINDOWSSystem32IMM32.DLL (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : wimm32.pdb File Version : 10.0.18362.387 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.387 ModLoad: 0000000075da0000 000000000000f000 C:WINDOWSSystem32kernel.appcore.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : Kernel.Appcore.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000747c0000 0000000000029000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000744c0000 000000000018f000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000744b0000 0000000000008000 C:WINDOWSSYSTEM32version.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 47711, Write: 0, Other 13591 - I/O Transfers Counters - Read: 0, Write: 207023, Other 0 - Paged Pool Usage - QuotaPagedPoolUsage: 247216, QuotaPeakPagedPoolUsage: 247520 QuotaNonPagedPoolUsage: 32152, QuotaPeakNonPagedPoolUsage: 38544 - Virtual Memory Usage - VirtualSize: 2110238720, PeakVirtualSize: 2137595904 - Pagefile Usage - PagefileUsage: 445714432, PeakPagefileUsage: 1451794432 - Working Set Size - WorkingSetSize: 454291456, PeakWorkingSetSize: 1457319936, PageFaultCount: 20195291 *** Dump of thread ID 1932 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 367187488.000000, User Time: 173045153792.000000, Wait Time: 6001909.000000 - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x76484192 - Registers - eax=0995d9a8 ebx=0995da54 ecx=00000003 edx=00000000 esi=02971c60 edi=02da4c54 eip=76484192 esp=0995d9a8 ebp=0995da00 cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000212 - Callstack - ChildEBP RetAddr Args to Child 0995da00 004dac4b e06d7363 00000001 00000003 0995da38 KERNELBASE!RaiseException+0x0 0995da44 004e2854 0995da54 02da4c54 029729a0 029729a8 rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 0995da60 004e1b6e 0995da7c 00b5df10 0006b9b0 3dab83e0 rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 0995da68 00b5df10 0006b9b0 3dab83e0 0995dc30 0995daa8 rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 0995da7c 00b5de44 0995dc30 7d27702a 3dab83e0 0b100380 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995daa8 02496c54 0995dc30 7d27715e 3dab83e0 3dab83d8 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995dbdc 0083db19 0b1017d0 0b100380 1fdac898 4843b188 rosetta_4.07_windows_intelx86!cppdb::mutex::mutex+0x0 0995dc08 00b14595 0b1017d0 0b100380 1fdac898 0995dc30 rosetta_4.07_windows_intelx86!cppdb::backend::statement::cache+0x0 0995dd34 00b115a7 1fdac898 4843b188 5e90a170 6f203200 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995dda8 00b2e198 1fdac898 4843b188 5e90a170 6f203200 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995ddec 00b2de34 0995de20 717bbdf0 5c2c2d70 1fdac898 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995de50 00b0b7f1 0995de98 717bbdf0 5c2c2d70 1fdac898 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995def8 00c6d009 1fdac898 4843b188 717bbdf0 6f826738 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995df78 00c6ba06 1fdac898 7d274a0a 46118a30 1fdac898 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995e088 00c8a95b 1fdac898 7d275946 5330f1b0 46118a30 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995f3c4 00c3979a 1fdac898 7d275eee 5330f1b0 616751ac rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995f46c 00c38ce7 1fdac898 616751ac 7d275faa 09fe5c80 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995f528 00bfaf25 1fdac898 7d275c4e 5e82afe7 09fe5c80 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995f6cc 00bf86df 0995f7b4 5e82afe7 00000000 0995f76c rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995f7ac 00c3fa05 5330f1b0 484a9d50 7d275d5a 00004e21 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995f7d8 00c33dcf 09f83870 0a012f98 7d27529a 09c94298 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995f818 004d94aa 09f83870 0a012f98 7d275656 032a14a4 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 0995fcd4 004e2267 00000027 09aa39f0 09a9af20 7d27579e rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 0995fd1c 75b76359 039e4000 75b76340 0995fd88 77097b74 rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 0995fd2c 77097b74 039e4000 1537adee 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 0995fd88 77097b44 ffffffff 770b8f3c 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 0995fd98 00000000 004e22dd 039e4000 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 *** Dump of thread ID 7456 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 156250.000000, User Time: 156250.000000, Wait Time: 6001909.000000 - Registers - eax=00000000 ebx=0000000a ecx=00000000 edx=00000000 esi=00000000 edi=2ef2fc94 eip=770a20bc esp=2ef2fc54 ebp=2ef2fcb8 cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 - Callstack - ChildEBP RetAddr Args to Child 2ef2fcb8 7647f32f 00000064 00000000 2ef2fef4 016ed11b ntdll!ZwDelayExecution+0x0 2ef2fcc8 016ed11b 00000064 016ed0f0 016ed0f0 00000000 KERNELBASE!Sleep+0x0 2ef2fef4 75b76359 00000000 75b76340 2ef2ff60 77097b74 rosetta_4.07_windows_intelx86!xmlMutexLock+0x0 2ef2ff04 77097b74 00000000 3250af06 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 2ef2ff60 77097b44 ffffffff 770b8f3c 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 2ef2ff70 00000000 016ed0f0 00000000 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 *** Dump of thread ID 4484 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 6001798.000000 - Registers - eax=00000000 ebx=0a919501 ecx=00000000 edx=00000000 esi=00000000 edi=366afd74 eip=770a20bc esp=366afd34 ebp=366afd98 cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 - Callstack - ChildEBP RetAddr Args to Child 366afd98 7647f32f 000007d0 00000000 366afe90 00f68d91 ntdll!ZwDelayExecution+0x0 366afda8 00f68d91 000007d0 42d85412 0a919570 00f68f70 KERNELBASE!Sleep+0x0 366afe90 00f68f77 00000000 016da2f5 00000000 42d85456 rosetta_4.07_windows_intelx86!xmlMutexLock+0x0 366afed4 75b76359 0a919570 75b76340 366aff40 77097b74 rosetta_4.07_windows_intelx86!xmlMutexLock+0x0 366afee4 77097b74 0a919570 2ac8af26 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 366aff40 77097b44 ffffffff 770b8f3c 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 366aff50 00000000 016da29e 0a919570 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> Grant Darwin NT |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 62 |
I've had 5 crashes in the last couple of days, a 1 hour, a 2 hour, a 3 hour, a 6 hour and a 12 hour. All zero credit, so from another thread, I assume this means none of them actually did anything at all. All Rosetta's, not mini's. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,619 RAC: 363 |
I have had 13 errors is the past two days. Only one was "cancelled by server". Compute errors (exit status 139) and nothing of much use in the stderr.txt output other than a: Starting watchdog... Watchdog active. statement. So what does a "watchdog" do at this project? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
"watchdog" is the name given to the mechanism that watches of the work unit and detects if it runs for more than 4 hours longer than the preferred runtime preference. Unfortunately, the message "watchdog ending" is NOT an indication that the watchdog took action and ended the task. It is simply reporting that as the task ends, that the watchdog is ending as well. So it can be had to tell if the watchdog stepped in or not. Rosetta Moderator: Mod.Sense |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,619 RAC: 363 |
"watchdog" is the name given to the mechanism that watches of the work unit and detects if it runs for more than 4 hours longer than the preferred runtime preference. Very strange then. NONE of my errors ran longer than the specified 8 hours. For example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1136159909 24,686 seconds or 6.84 hours. Well within the stock 8 hr target cpu time. |
Message boards :
Number crunching :
Computation errors
©2024 University of Washington
https://www.bakerlab.org