If You Don't Know Where to Put it, Post it here.

Message boards : Number crunching : If You Don't Know Where to Put it, Post it here.

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 12 · Next

AuthorMessage
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93620 - Posted: 6 Apr 2020, 9:33:19 UTC

Greetings,

Here's a weird issue that I haven't seen before in my 16+ years of using BOINC:

I just noticed that when I restarted BOINC, after logging back in, all the tasks I was previously working on ran back to 0 (zero), starting over. Is this a Rosetta thing?

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93620 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1681
Credit: 17,854,150
RAC: 20,118
Message 93623 - Posted: 6 Apr 2020, 9:51:50 UTC

Losing WU progress
Grant
Darwin NT
ID: 93623 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93625 - Posted: 6 Apr 2020, 10:25:52 UTC - in response to Message 93623.  

Losing WU progress

Hi Grant,

Thanks! I didn't see that thread. I did change my memory and disk usage per your post over on SETI. I have 32GB RAM and 1TB M.2 NVMe SSD. I'll see if this fixes it. :)

It just seemed weird when I started BOINC and all 6 tasks started over at zero.

Thanks again and have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93625 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 93669 - Posted: 6 Apr 2020, 19:34:01 UTC

Yup, had the same problem with a bunch of WUs that eventually took over 12 hours to complete, with a 8 hours target time. Simply, a single decoy of these WUs took 12 hours, so it could not stop before, nor checkpoint. Very rare anyway.
ID: 93669 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93737 - Posted: 7 Apr 2020, 18:07:47 UTC

Greetings,

I'm done with Rosetta! I have it set to NNT and if and when I get done with the current tasks, I will be detaching from Rosetta.

I'm tired of having these 7 hr tasks get to 5+ hours just to rewind to zero % done when I log out and log back in and then have to make up for the 5+ hours already done that's lost. I have a dual boot system and I log into Windows 10 once or twice a day to play World of Warcraft. If Rosetta cannot set/or respect a checkpoint so that I can begin where I left off, then this project is NOT for me. That, in my opinion, is disrespectful to the user and the device doing the work.

And to think I really, really wanted to do this to help with COVID-19... :(

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93737 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile yoerik
Avatar

Send message
Joined: 24 Mar 20
Posts: 128
Credit: 169,525
RAC: 0
Message 93739 - Posted: 7 Apr 2020, 18:13:37 UTC - in response to Message 93737.  

I'm tired of having these 7 hr tasks get to 5+ hours just to rewind to zero % done when I log out and log back in and then have to make up for the 5+ hours already done that's lost. I have a dual boot system and I log into Windows 10 once or twice a day to play World of Warcraft. If Rosetta cannot set/or respect a checkpoint so that I can begin where I left off, then this project is NOT for me. That, in my opinion, is disrespectful to the user and the device doing the work.


That sounds like a glitch. What is your "request tasks to checkpoint at most every" set to, in computing preferences in BOINC Manager?
ID: 93739 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,627,225
RAC: 10,243
Message 93745 - Posted: 7 Apr 2020, 18:34:07 UTC

If you can hibernate instead of shutting down then it will restart from where it left off.

If the models are large then it might not be able to checkpoint very often.
ID: 93745 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93754 - Posted: 7 Apr 2020, 19:19:52 UTC - in response to Message 93739.  

Hi Yoerik

That sounds like a glitch. What is your "request tasks to checkpoint at most every" set to, in computing preferences in BOINC Manager?

It is set to 600 seconds (10 minutes). There was a reason, which I cannot remember off hand now, for setting it to 600 when all I was doing was SETI. I don't remember ever seeing SETI tasks running at high priority or losing the checkpoints.

I don't think they (the Rosetta crew) set a long enough deadline. I'm also back to running high priority again. I have 16 tasks due tomorrow and 8 due the day after. If I were to be running more than one BOINC project, no other project would get a chance to get tasks done.

Anyway, it's all a moot point right now since I've got it set to NNT and will detach if and when the tasks get done.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93755 - Posted: 7 Apr 2020, 19:22:34 UTC - in response to Message 93745.  

If you can hibernate instead of shutting down then it will restart from where it left off.

If the models are large then it might not be able to checkpoint very often.

It's a dual boot system, hibernation does not work when booting into another OS. At least not that I know of.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93755 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1681
Credit: 17,854,150
RAC: 20,118
Message 93793 - Posted: 8 Apr 2020, 0:39:34 UTC - in response to Message 93754.  
Last modified: 8 Apr 2020, 0:41:11 UTC

It is set to 600 seconds (10 minutes). There was a reason, which I cannot remember off hand now, for setting it to 600 when all I was doing was SETI. I don't remember ever seeing SETI tasks running at high priority or losing the checkpoints.
Set the checkpoint to every 60 seconds.
It doesn't necessarily check point at that time, it just asks the Application to checkpoint if it is able to.


I don't think they (the Rosetta crew) set a long enough deadline. I'm also back to running high priority again. I have 16 tasks due tomorrow and 8 due the day after. If I were to be running more than one BOINC project, no other project would get a chance to get tasks done.
4 days is plenty. Of course the more projects you run, the smaller you cache should be.
The fact you have just joined up to another project means it will have to sort out just how long the Tasks run for (yes, they are set for 8 hours by default- but the BOINC Manager and the Rosetta servers need to work things out so the Estimated times match reality. The more projects you have, and the larger your cache, the longer it will take for the Manager to sort things out.
Tasks running High priority isn't an error, it isn't a problem (normally). It's just the Manager trying to honour your cache & resource share settings.


Anyway, it's all a moot point right now since I've got it set to NNT and will detach if and when the tasks get done.
Or you can just leave and not help.
*shrug*
Grant
Darwin NT
ID: 93793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93847 - Posted: 8 Apr 2020, 10:21:29 UTC - in response to Message 93793.  

It is set to 600 seconds (10 minutes). There was a reason, which I cannot remember off hand now, for setting it to 600 when all I was doing was SETI. I don't remember ever seeing SETI tasks running at high priority or losing the checkpoints.
Set the checkpoint to every 60 seconds.
It doesn't necessarily check point at that time, it just asks the Application to checkpoint if it is able to.


I don't think they (the Rosetta crew) set a long enough deadline. I'm also back to running high priority again. I have 16 tasks due tomorrow and 8 due the day after. If I were to be running more than one BOINC project, no other project would get a chance to get tasks done.
4 days is plenty. Of course the more projects you run, the smaller you cache should be.
The fact you have just joined up to another project means it will have to sort out just how long the Tasks run for (yes, they are set for 8 hours by default- but the BOINC Manager and the Rosetta servers need to work things out so the Estimated times match reality. The more projects you have, and the larger your cache, the longer it will take for the Manager to sort things out.
Tasks running High priority isn't an error, it isn't a problem (normally). It's just the Manager trying to honour your cache & resource share settings.


Anyway, it's all a moot point right now since I've got it set to NNT and will detach if and when the tasks get done.
Or you can just leave and not help.
*shrug*

Hi Grant,

I set the checkpoint setting to 60 seconds. Rosetta is the only project I am running. I still have SETI set for any GPU resends, but have not gotten anything since about the 31st.

Actually I ran Rosetta way back in late 2006. I don't remember why I quite Rosetta back whenever I did. I don't know, perhaps to do SETI solo?

I don't think I have had more than 30 tasks this time around at any given time since I restarted doing Rosetta. 4 days? The deadlines for SETI were in the weeks and the tasks took MUCH less time to do than here, even the CPU tasks I was doing in about 40 minutes give or take. Wasn't it 11 of each type of app, at SETI, that was needed to figure out the estimated run time? I have done WAY more than that on some of these here and it still takes 7 to 15 hours to do them. Rosetta-mini is my lowest, I've done 1 of those. The rest are 10, 15, 28, 39 (not in that order). How many do I have to do before the servers and BOINC figure out this PC can do them faster?

Last night before bed, I saw that my tasks had been running about 8 hours each with 3 hours left. This morning new tasks were being processed so I logged into Windows 10 and when done there logged back into here and 4 of the 6 tasks restarted at zero % and 2 continued from where they left off. This is ridiculous. It's like I'm running on a 486 instead of an i7 8th gen.

This PC is no slouch. It's a Gen 8 8086K running at 4Ghz with 32GB RAM. I was doing SETI GPU tasks in seconds to a minute and as I mentioned CPU tasks in about 40 minutes give or take.

It's NOT that I don't want to help with COVID-19, it's the fact that I'm tired of redoing and redoing and redoing work that takes too long to do on this PC when it should be doing the work in much less time and NOT starting over because BOINC "forgets" the checkpoints.

If I see a major difference between now and the last tasks listed, perhaps I will stay with Rosetta a while longer. If, however, checkpoints are still not honored and it still takes 8 to 15 hours to do a task, I'm gone. I'm sorry, but my PC's time is worth a LOT more than this.

And as an aside, I really hate ALL white websites and fora... ;)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93847 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1681
Credit: 17,854,150
RAC: 20,118
Message 93852 - Posted: 8 Apr 2020, 11:13:21 UTC - in response to Message 93847.  

I have done WAY more than that on some of these here and it still takes 7 to 15 hours to do them. Rosetta-mini is my lowest, I've done 1 of those. The rest are 10, 15, 28, 39 (not in that order). How many do I have to do before the servers and BOINC figure out this PC can do them faster?
Still around 10 non error Tasks, however there have been new applications in the last few days, so everything starts from scratch for them. Likewise, different Tasks may take more or less processing, so it takes time for things to adjust to those as well.
The fact is Tasks run for a set time, the default being 8 hours. It takes a while for the Estimated times to match up with the actual times.


This PC is no slouch. It's a Gen 8 8086K running at 4Ghz with 32GB RAM. I was doing SETI GPU tasks in seconds to a minute and as I mentioned CPU tasks in about 40 minutes give or take.
Once again, Tasks run for a set time.
Some may bail out early. Some may run longer (but there is a 4 hour cutoff). There are some where the default time was set longer than usual, but most of those have gone now.

But the fact is Tasks run for a set time.


It's NOT that I don't want to help with COVID-19, it's the fact that I'm tired of redoing and redoing and redoing work that takes too long to do on this PC when it should be doing the work in much less time and NOT starting over because BOINC "forgets" the checkpoints.
I have no idea what settings you have on your system to make checkpoints not work.
I've set it for the default of 60 seconds, and the most time i have lost on a restart has been 5 minutes. Usually it's only a couple of minutes.

I've got 6c/12t all in use, 32GB RAM, with no issues, these are my settings,

Computing preferences-
Usage limits
Use at most 100 % of the CPUs
Use at most 100 % of CPU time


When to suspend
Basically, never.


Other
Store at least              1 days of work
Store up to an additional   0.02 days of work
Switch between tasks every  60 minutes
Request tasks to checkpoint at most every 60 seconds


Disk
Use no more than 20 GB
Leave at least    2 GB free
Use no more than 60% of total


When computer is in use, use at most     95 %
When computer is not in use, use at most 95 %
Leave non-GPU tasks in memory while suspended (not selected)
Page/swap file: use at most              75 %



Rosetta@home preferences
Percentage of CPU time used for graphics      not selected
Number of frames per second for graphics      not selected
Target CPU run time                           not selected

Grant
Darwin NT
ID: 93852 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93866 - Posted: 8 Apr 2020, 13:54:54 UTC - in response to Message 93852.  

Hi Grant,

With the exception of 3 settings, all were basically the same as yours. The ones I changed were:
Use at most 60 % of CPUs - Changed to 100 %

Store up to an additional 0.05 days of work - Changed to 0.02

My disk usage is set to 100GB since I have a 1TB SSD.

The graphics settings are now set to "Not selected" even though it says that "Not selected" will default to 10.

So, basically there should be no problem with checkpoints and running high priority since my settings were basically identical to yours. ;)

We'll see what happens in a few hours when I log into Windows 10 to do some World of Warcraft stuff for about an hour and 1/2. ;)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93866 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93896 - Posted: 8 Apr 2020, 18:58:01 UTC

Greetings,

Ok. I just logged back in and 11 of my 12 tasks started over at ZERO! A few were getting somewhat close to finish when I shut down BOINC and logged into Windows 10.

For those here, including Grant, that think I might be blowing smoke about this, I have a little test for you to perform and I bet that what happens when I log back in will happen to many of you.

It's really simple. Take note of where your tasks are at in elapsed and remaining time. Don't need to be precise, just a mental note. Shut down BOINC including the app(s), wait a few seconds or so then restart BOINC. I'll bet $10 bucks that some, if not all, of your tasks will restart at zero.

@Grant: My settings are damn near identical to yours.

I really would like to continue with Rosetta, but if something isn't done about the checkpoints... forget it. I can live with the tasks running in high priority, I'm just tired of the wasted work because no checkpoints are being set.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93896 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93899 - Posted: 8 Apr 2020, 19:14:58 UTC

Are all of the machines discussed in this thread running the i686 Linux application? It seems to presently have an issue where it runs long enough that the watchdog ends it, and doesn't complete the first model.
Rosetta Moderator: Mod.Sense
ID: 93899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnDK
Avatar

Send message
Joined: 6 Apr 20
Posts: 33
Credit: 2,390,240
RAC: 0
Message 93903 - Posted: 8 Apr 2020, 20:27:12 UTC

So anybody getting new work the last hours?
ID: 93903 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 93905 - Posted: 8 Apr 2020, 20:41:40 UTC

It is dry, we crunched everything, time to wait for new work.
ID: 93905 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnDK
Avatar

Send message
Joined: 6 Apr 20
Posts: 33
Credit: 2,390,240
RAC: 0
Message 93906 - Posted: 8 Apr 2020, 20:48:07 UTC

Server status says 13355 tasks ready to send, if you can count on that.
ID: 93906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Siran d'Vel'nahr
Avatar

Send message
Joined: 15 Nov 06
Posts: 72
Credit: 2,674,678
RAC: 0
Message 93908 - Posted: 8 Apr 2020, 21:00:58 UTC - in response to Message 93899.  

Are all of the machines discussed in this thread running the i686 Linux application? It seems to presently have an issue where it runs long enough that the watchdog ends it, and doesn't complete the first model.

Hi Mod,

Ok, now we seem to be getting somewhere. I checked the properties on a couple of the tasks I have running and they are running on the i686 app.

Is this why no checkpoints are set and the tasks start over from zero % when restarted?

In case it helps this is my current system.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr XO
USS Vre'kasht NCC-33187

"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 93908 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnDK
Avatar

Send message
Joined: 6 Apr 20
Posts: 33
Credit: 2,390,240
RAC: 0
Message 93911 - Posted: 8 Apr 2020, 21:06:47 UTC

Got work now :)
ID: 93911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 12 · Next

Message boards : Number crunching : If You Don't Know Where to Put it, Post it here.



©2024 University of Washington
https://www.bakerlab.org