Message boards : Number crunching : Computation errors
Previous · 1 · 2
Author | Message |
---|---|
Buckeye4lf Send message Joined: 29 Aug 08 Posts: 43 Credit: 8,534,757 RAC: 2,364 |
I just had a whole batch of jobs error out all of the had the error " Too many result" name rb_04_01_20095_19938_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_03_06_904919_9 application Rosetta created 2 Apr 2020, 0:50:45 UTC minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 1, 1 errors Too many total results Stderr output <core_client_version>7.17.0</core_client_version> <![CDATA[ <message> process got signal 11</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_x86_64-pc-linux-gnu @rb_04_01_20095_19938_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -psipred_ss2 t000_.spider3_ss2 -kill_hairpins t000_.nobuformat.spider3_ss2 -jumps:pairing_file t000_.fasta.bbcontacts.jumps -abinitio::use_filters false -skip_convergence_check -jumps:overlap_chainbreak -seq_sep_stages 1 1 1 -ramp_chainbreaks -sep_switch_accelerate 0.8 -jumps:random_sheets 7 2 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_04_01_20095_19938_ab_t000__robetta.zip -frag3 rb_04_01_20095_19938_ab_t000__robetta.200.3mers.index.gz -fragA rb_04_01_20095_19938_ab_t000__robetta.200.6mers.index.gz -fragB rb_04_01_20095_19938_ab_t000__robetta.200.3mers.index.gz -nstruct 10000 -cpu_run_time 57600 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3752306 Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. BOINC:: CPU time: 18587.7s, 14400s + 3600s[2020- 4- 2 16:44:30:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 18587.7 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== 16:44:30 (10035): called boinc_finish(0) </stderr_txt> ]]> What does this indicate. I do not want to spend all my time running jobs and them just erroring out at the end. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,194,697 RAC: 9,774 |
I just had a whole batch of jobs error out all of the had the error " Too many result" I don't know. But could the message really mean 'too many error results to list' ? Signal 11 gets reported a lot - is that memory again (but I see you've got loads) Stream information inconsistent - no idea And the watchdog had to cut in and shut the task down after 1hr runtime +4hrs watchdog Certainly lots going wrong |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It shows you are on BOINC 7.17.0, (the current recommended version is 7.4.22) It also indicates that the task did not complete the first model within the 1hr preferred runtime plus 4hr watchdog timeout. The task was created in a way that causes it not to go out to another host for validation. So the one error was "too many", and the WU (which, sometimes, could be more than just the task that went to you) was ended. And then I guess as the watchdog went to end the task, it found no output file. In a nutshell, you hit a long running model against the smallest possible runtime, and it was ended for you. Rosetta Moderator: Mod.Sense |
Tom Rinehart Send message Joined: 28 Mar 20 Posts: 7 Credit: 1,637,467 RAC: 0 |
I posted this on ralph@home, but this is probably better. All the 4.12 work units on my Mac quickly end in a computation error: <core_client_version>7.14.3</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> command: rosetta_4.12_x86_64-apple-darwin -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--covid_spike_design_boinc_v1.xml @flags_jhr_cv -in:file:silent 3xc3uf2h_Junior_HalfRoid_vs_COVID-19_design1.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 3xc3uf2h_Junior_HalfRoid_vs_COVID-19_design1.zip @3xc3uf2h_Junior_HalfRoid_vs_COVID-19_design1.flags -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 Starting watchdog... Watchdog active. error: zipfile probably corrupt (illegal instruction) </stderr_txt> ]]> One of my other Macs is getting this error: <core_client_version>7.14.3</core_client_version> <![CDATA[ <message> process got signal 4</message> <stderr_txt> </stderr_txt> ]]> I have two linux boxes running Debian Buster that are working fine, so it looks like a Mac app problem. |
Buckeye4lf Send message Joined: 29 Aug 08 Posts: 43 Credit: 8,534,757 RAC: 2,364 |
It shows you are on BOINC 7.17.0, (the current recommended version is 7.4.22) Okay thanks. I just wanted to make sure it was not a hardware issue to prevent issue in the future. This round was lots of wasted computation time. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 62 |
>>> (the current recommended version is 7.4.22) 7.14.2. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,619 RAC: 363 |
It shows you are on BOINC 7.17.0, (the current recommended version is 7.4.22) That information is incorrect. It only reflects that no official developer at BOINC has compiled a current version. Linux Boinc has been neglected and abandoned for the past 6 years with no official BOINC distribution since they haven't had any Linux developer active since 2014. Current Linux Boinc versions can be compiled by the end user or the user can use one of the official ppas or the version distributed in their Linux distributions repository. That is usually minimally the 7.9.3 version in most Debian distros and up to 7.14.2 version in others. |
Message boards :
Number crunching :
Computation errors
©2024 University of Washington
https://www.bakerlab.org