Communication during recent downtime

Message boards : Number crunching : Communication during recent downtime

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 69471 - Posted: 22 Jan 2011, 11:10:11 UTC - in response to Message 69470.  
Last modified: 22 Jan 2011, 11:10:30 UTC


Here is some news on the hardware front from Dr. Baker's journal: "First, I would like to thank everybody for bearing with us while we recovered from a critical server hardware failure. Over the next month or two we will be installing more powerful and more robust hardware so hopefully this will not happen again."
This quote is old news, check the date... ;-)

Ralf


Looks fairly recent to me.


For those of you not wanting to track down the quote, it was written by Dr Baker on 20 Jan 2011 and quoted in this thread on 21 Jan 2011.
ID: 69471 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,070,625
RAC: 2,159
Message 69473 - Posted: 22 Jan 2011, 16:49:37 UTC - in response to Message 69471.  


Here is some news on the hardware front from Dr. Baker's journal: "First, I would like to thank everybody for bearing with us while we recovered from a critical server hardware failure. Over the next month or two we will be installing more powerful and more robust hardware so hopefully this will not happen again."
This quote is old news, check the date... ;-)

Ralf


Looks fairly recent to me.


For those of you not wanting to track down the quote, it was written by Dr Baker on 20 Jan 2011 and quoted in this thread on 21 Jan 2011.
Sorry, but I could swear that I have seen the very same words in a post from him more than a year ago... :?

Ralf
ID: 69473 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 69474 - Posted: 22 Jan 2011, 21:12:11 UTC - in response to Message 69473.  


Here is some news on the hardware front from Dr. Baker's journal: "First, I would like to thank everybody for bearing with us while we recovered from a critical server hardware failure. Over the next month or two we will be installing more powerful and more robust hardware so hopefully this will not happen again."
This quote is old news, check the date... ;-)

Ralf


Looks fairly recent to me.


For those of you not wanting to track down the quote, it was written by Dr Baker on 20 Jan 2011 and quoted in this thread on 21 Jan 2011.
Sorry, but I could swear that I have seen the very same words in a post from him more than a year ago... :?

Ralf



You might be thinking of this comment.
Message 60267 - Posted 22 Mar 2009 5:47:20 UTC

Rosetta@home has received a substantial monetary contribution from an anonymous donor! Following the suggestion of the donor, the University of Washington has used the money to start a special “Rosetta@home fund” that will be used to pay part of David Kim’s salary (David is the architect of Rosetta@home and the person who keeps the project running), upgrade the servers as needed, and allow us to make more rapid progress on the disease-releated research Rosetta@home is carrying out. If you would like to make a (tax-deductible) contribution to the project, the link is Rosetta@home fund . David will be adding a link to this from the Rosetta@home home page in the next day or two. Thank you for your contributions to the project!

Otherwise there is no mention in 2009 of the word server anywhere in his journal archives. And nothing other than the post I quoted in the current thread of 2010-11.
ID: 69474 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,070,625
RAC: 2,159
Message 69480 - Posted: 24 Jan 2011, 18:08:25 UTC - in response to Message 69474.  


Here is some news on the hardware front from Dr. Baker's journal: "First, I would like to thank everybody for bearing with us while we recovered from a critical server hardware failure. Over the next month or two we will be installing more powerful and more robust hardware so hopefully this will not happen again."
This quote is old news, check the date... ;-)

Ralf


Looks fairly recent to me.


For those of you not wanting to track down the quote, it was written by Dr Baker on 20 Jan 2011 and quoted in this thread on 21 Jan 2011.
Sorry, but I could swear that I have seen the very same words in a post from him more than a year ago... :?

Ralf



You might be thinking of this comment.
Message 60267 - Posted 22 Mar 2009 5:47:20 UTC

Rosetta@home has received a substantial monetary contribution from an anonymous donor! Following the suggestion of the donor, the University of Washington has used the money to start a special “Rosetta@home fund” that will be used to pay part of David Kim’s salary (David is the architect of Rosetta@home and the person who keeps the project running), upgrade the servers as needed, and allow us to make more rapid progress on the disease-releated research Rosetta@home is carrying out. If you would like to make a (tax-deductible) contribution to the project, the link is Rosetta@home fund . David will be adding a link to this from the Rosetta@home home page in the next day or two. Thank you for your contributions to the project!

Otherwise there is no mention in 2009 of the word server anywhere in his journal archives. And nothing other than the post I quoted in the current thread of 2010-11.

No, that's not it. I looked myself briefly but a search didn't allow to go back more than a year on the posts.

One thing I know for sure though, I could not have seen the post from DB on 2011-01-20, as I have not been anywhere near a computer with free Internet access on Thursday/Friday to do so. But the wording immediately looked familiar when I checked the "Number crunching" forum after I got back home on Saturday...
Anyway, kind of a mute point by now, so let's move on...

Ralf
ID: 69480 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 69485 - Posted: 25 Jan 2011, 12:27:00 UTC - in response to Message 69480.  


Here is some news on the hardware front from Dr. Baker's journal: "First, I would like to thank everybody for bearing with us while we recovered from a critical server hardware failure. Over the next month or two we will be installing more powerful and more robust hardware so hopefully this will not happen again."
This quote is old news, check the date... ;-)

Ralf


Looks fairly recent to me.


For those of you not wanting to track down the quote, it was written by Dr Baker on 20 Jan 2011 and quoted in this thread on 21 Jan 2011.
Sorry, but I could swear that I have seen the very same words in a post from him more than a year ago... :?

Ralf



You might be thinking of this comment.
Message 60267 - Posted 22 Mar 2009 5:47:20 UTC

Rosetta@home has received a substantial monetary contribution from an anonymous donor! Following the suggestion of the donor, the University of Washington has used the money to start a special “Rosetta@home fund” that will be used to pay part of David Kim’s salary (David is the architect of Rosetta@home and the person who keeps the project running), upgrade the servers as needed, and allow us to make more rapid progress on the disease-releated research Rosetta@home is carrying out. If you would like to make a (tax-deductible) contribution to the project, the link is Rosetta@home fund . David will be adding a link to this from the Rosetta@home home page in the next day or two. Thank you for your contributions to the project!

Otherwise there is no mention in 2009 of the word server anywhere in his journal archives. And nothing other than the post I quoted in the current thread of 2010-11.

No, that's not it. I looked myself briefly but a search didn't allow to go back more than a year on the posts.

One thing I know for sure though, I could not have seen the post from DB on 2011-01-20, as I have not been anywhere near a computer with free Internet access on Thursday/Friday to do so. But the wording immediately looked familiar when I checked the "Number crunching" forum after I got back home on Saturday...
Anyway, kind of a mute point by now, so let's move on...

Ralf


yep no point in hashing this to death.
ID: 69485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 69514 - Posted: 29 Jan 2011, 12:28:54 UTC

It is a bit late but I just noticed the new message on the front page:

Jan 26, 2011
Outage Notice: We will be offline for a brief 1-2 hour period tomorrow, Thursday the 27th, starting at around 10am PST for maintenance.


Thank you to the Project Team for listening to our concerns and I hope you are able to continue with this level of communication. A short sentence like that is very quick to write but makes a major difference to people's perceptions of the project.
ID: 69514 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 69523 - Posted: 29 Jan 2011, 18:45:58 UTC

I think that the issue here can be boiled down to two different points - which are in direct conflict with each other.

First, the academic environment is - what is the word I'm looking for - casual. I make an annual trip up to Indiana University to destroy a truckload of PCs using their cyclotron.

At times it drives me crazy - the good folks I work with up there are probably smarter than my whole family tree combined - but things are relaxed - if the facility goes down for a few days, it goes down for a few days. Nothing to get your underwear in a wad over.

I on the other hand have been a system's programmer for most of my adult life and uptime was king. I suspect that many of the others donating time to the Rosetta project have the same background. The data center was put up on a pedestal and if the system went down holidays, birthdays, weddings and other minor life events were put on hold.

So I think that what we are dealing with here is the collision of two distinct yet diametrically opposed cultures. Neither is right and neither is wrong but we have to find some middle ground.

I for one am willing to let them run to the Men's Room for a quick BIO-Break as long as they in turn make an effort to keep us in the loop. The message about the scheduled maintenance time is an example of communication done right.
ID: 69523 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,135,730
RAC: 4,670
Message 69529 - Posted: 30 Jan 2011, 12:26:13 UTC - in response to Message 69523.  

I think that the issue here can be boiled down to two different points - which are in direct conflict with each other.

First, the academic environment is - what is the word I'm looking for - casual. I make an annual trip up to Indiana University to destroy a truckload of PCs using their cyclotron.

At times it drives me crazy - the good folks I work with up there are probably smarter than my whole family tree combined - but things are relaxed - if the facility goes down for a few days, it goes down for a few days. Nothing to get your underwear in a wad over.

I on the other hand have been a system's programmer for most of my adult life and uptime was king. I suspect that many of the others donating time to the Rosetta project have the same background. The data center was put up on a pedestal and if the system went down holidays, birthdays, weddings and other minor life events were put on hold.

So I think that what we are dealing with here is the collision of two distinct yet diametrically opposed cultures. Neither is right and neither is wrong but we have to find some middle ground.

I for one am willing to let them run to the Men's Room for a quick BIO-Break as long as they in turn make an effort to keep us in the loop. The message about the scheduled maintenance time is an example of communication done right.


I think you are seeing the difference between work and university thinking, in a work environment uptime is king while in a university setting if it is up by tomorrow it is okay. A liaise fair attitude if you can figure out what I am trying to say. In the past, when Seti for instance went down when someone dug up and stole the copper cables, projects thought of us users as purely people who donated when we could but did not have any 'investment' in a project. Now a days we users do have an 'investment' in a project and we are 'involved' in the outcome of the research! When the project admins understand this fundamental change in our thinking, their communications towards us will change. I think it might even take a whole ton of people leaving a project due to a problem before projects 'get it', but in the end they will 'get it'. The project admins think of the project as 'their baby', but if 'their baby' suddenly died due to their lack of acknowledging the users need for simple communications, they might rethink things.
ID: 69529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BILL

Send message
Joined: 16 Jan 06
Posts: 1
Credit: 320,083
RAC: 0
Message 69542 - Posted: 31 Jan 2011, 7:58:22 UTC - in response to Message 69529.  

[quote]
<snippage has occurred> Now a days we users do have an 'investment' in a project and we are 'involved' in the outcome of the research! When the project admins understand this fundamental change in our thinking, their communications towards us will change. I think it might even take a whole ton of people leaving a project due to a problem before projects 'get it', but in the end they will 'get it'. The project admins think of the project as 'their baby', but if 'their baby' suddenly died due to their lack of acknowledging the users need for simple communications, they might rethink things.


Here's some shocking news for you. You are never going to see a Boinc project die because all the crunchers up and quit because they didn't get the "stroking" they thought deserved. Most crunchers don't care if there is an outage as long as it gets fixed and repeated reports saying "We're working on it." aren't necessary. There are crunchers out there that don't even know their machines are still running Boinc projects.

The project managers are fully aware that it's only a small vocal percentage that complain about not having their hands held and their leaving would not be a problem. There are a few that would probably wish the complainers bon voyage because the number that they do have is straining the equipment/system to the point of repeated failures. Milkyway@home comes to mind.

So don't trot out that "if we don't get what we want we'll all quit" argument. The project admins know it's not going to happen, it's childish, and you're just embarrassing yourself.
ID: 69542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 69548 - Posted: 31 Jan 2011, 14:59:14 UTC - in response to Message 69542.  

The project managers are fully aware that it's only a small vocal percentage that complain about not having their hands held and their leaving would not be a problem. There are a few that would probably wish the complainers bon voyage because the number that they do have is straining the equipment/system to the point of repeated failures. Milkyway@home comes to mind.

So don't trot out that "if we don't get what we want we'll all quit" argument. The project admins know it's not going to happen, it's childish, and you're just embarrassing yourself.


The Rosetta scientists have said repeatedly that they want to increase the amount of work that gets done through Rosetta@home and at times have encouraged users to find ways to promote Rosetta and get new people to join. For most of the last few years Rosetta has been averaging at about 100 TeraFlops as our efforts to get new crunchers has only replaced those who leave.

Late last year there was a problem with another project and a large number of users migrated to Rosetta and pushed the work rate up to around 140 TeraFlops. Even after the other project came back online a large number of the users stayed and we were seeing around 130 TeraFlops on the home page.

When the problems occurred in late December/early January there was a panic on these boards by a large number of users - feel free to read back in the comments both here and in the Q&A pages. During the downtime there was a period of 6 days without any advice to users whether they should abort the work they couldn't upload, detach from Rosetta and reattach later or do something else entirely. I and a number of other volunteers tried to fill the silence by the Project team by giving the best advice we could, but a number of people declared that they were abandoning Rosetta because the Project team wouldn't respond with a single sentence for nearly a week (and what one person declares you can be sure some other people that are silent are thinking the same thing).

The result is that in November and December we were averaging about 130 to 140 TeraFlops. Right now we are only at 110 TeraFlops.

While you are correct that the project won't "die" because the Project team fail to communicate, it is also true that the Project won't grow. It is quite sad to see all the effort put into boosting Rosetta's reputation and performance squandered by 6 days of silence.
ID: 69548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Kirby54925
Avatar

Send message
Joined: 4 Feb 10
Posts: 4
Credit: 6,423,293
RAC: 0
Message 69560 - Posted: 1 Feb 2011, 6:53:40 UTC - in response to Message 69548.  


While you are correct that the project won't "die" because the Project team fail to communicate, it is also true that the Project won't grow. It is quite sad to see all the effort put into boosting Rosetta's reputation and performance squandered by 6 days of silence.


That's exactly what I was thinking. Twitter and Facebook are the admins' best friends during server downtimes. Have every project admin (including Dr. Baker) know the login information for the project's Twitter and Facebook accounts.
ID: 69560 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael Gould

Send message
Joined: 3 Feb 10
Posts: 39
Credit: 15,365,453
RAC: 7
Message 69578 - Posted: 2 Feb 2011, 6:48:16 UTC

I would think that the number of users and teraFlops would drop after an extended outage, regardless of how much communication we get from the project. It is surely a very low percentage of active users who ever visit these boards, or would look at twitter or facebook accounts.

I wonder if any good research has been done into what draws users to a particular project, and what induces them to stay long term or switch to another one. Surely there are enough graduate students out there looking for thesis material!

ID: 69578 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 9,592
Message 69588 - Posted: 3 Feb 2011, 8:24:12 UTC - in response to Message 69578.  

I would think that the number of users and teraFlops would drop after an extended outage, regardless of how much communication we get from the project. It is surely a very low percentage of active users who ever visit these boards, or would look at twitter or facebook accounts.

I didn't visit the twitter site - I'd signed up to receive the tweets at some point in the past so it came through to my phone, which I'm sure is the same for a number of people on here - the facebook 'likes' at the top has had 1222 hits as it stands so there's a reasonable pool of people who do visit the site who would see such posts, and that's the target audience which is enough - it then gets relayed to the other forums quite quickly (e.g. the boinc forum - project status thread).

A few people have suggested that it's such a small minority that read these boards, but firstly, there's no need to lose the majority of them (I'm sure some will leave anyway - you can't please all of the people!), secondly, they tend to have much higher than average output, and thridly, they're more vocal so they'll spread the word more than the average person running the project.
ID: 69588 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 69594 - Posted: 4 Feb 2011, 21:07:32 UTC

I got started on this project years ago after reading about it in the online edition of Seattle Times (my old home of 13 years). I found one other project not related to my other "home area" of Hanford and got started with Einstein with it LIGO connection. I would have not know either of these projects if I had not stumbled across that article. So social media and other websites don't always paint an accurate picture of who does these projects or how many got involved in them via the social media. Think a lot of it is word of mouth and then the credit crunchers that try to outdo each other.

Those of us that have been here awhile like to know whats going on. Even if there is some sort of server failure we still would like to know what is happening. We have been saying for years there is a lack of communication from the project to us about what is happening with system failures, general project updates and the such. This outage struck a nerve with the lack of news. Some of the 'new' members here are a little more impatient than others. But as has been said before, even a short 1 or 2 line update is better than nothing. There are all the resources that this project has tapped into to recruit, but someone forgot they can be used to report progress if the main page has been wiped out.

If for some reason there was a total collapse of the system for Rosie, FB or Twitter or the Boinc projects boards would be places to post updates or what happened. I suggested long ago that the project tap into the communications department of the University and find a person who would be their "spokesperson" for the webpage and the social media sites. Then the project admins would not have to be bothered with trying to put out updates. They would just need to say something to the "spokesperson" who would take care of it for them.
ID: 69594 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
l_mckeon

Send message
Joined: 5 Jun 07
Posts: 44
Credit: 180,717
RAC: 0
Message 69595 - Posted: 4 Feb 2011, 22:09:05 UTC - in response to Message 69401.  

The BOINC home page is the obvious place for official announcements of outages, with Twitter, FB etc. available as extras.
ID: 69595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,135,730
RAC: 4,670
Message 69602 - Posted: 5 Feb 2011, 10:58:17 UTC - in response to Message 69595.  

The BOINC home page is the obvious place for official announcements of outages, with Twitter, FB etc. available as extras.


The only problem with that is that it is usually hosted on the very hardware that is down, so they need a backup plan too.
ID: 69602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 9,592
Message 69632 - Posted: 13 Feb 2011, 17:09:52 UTC - in response to Message 69602.  

The BOINC home page is the obvious place for official announcements of outages, with Twitter, FB etc. available as extras.


The only problem with that is that it is usually hosted on the very hardware that is down, so they need a backup plan too.

Mikey - l_mckeon is suggesting the BOINC home page rather than the Rosetta home page. ;)
ID: 69632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,135,730
RAC: 4,670
Message 69635 - Posted: 14 Feb 2011, 11:41:56 UTC - in response to Message 69632.  

The BOINC home page is the obvious place for official announcements of outages, with Twitter, FB etc. available as extras.


The only problem with that is that it is usually hosted on the very hardware that is down, so they need a backup plan too.

Mikey - l_mckeon is suggesting the BOINC home page rather than the Rosetta home page. ;)


Then that works for me!
ID: 69635 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Communication during recent downtime



©2024 University of Washington
https://www.bakerlab.org