Feedback, .. bandwidth usage :-(

Message boards : Rosetta@home Science : Feedback, .. bandwidth usage :-(

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 7178 - Posted: 22 Dec 2005, 14:14:09 UTC
Last modified: 22 Dec 2005, 14:15:34 UTC

Please please please remember that not everyone has the luxury of always on, fast, unlimited bandwith connections to yourselves. This would also help out yourselves I would think.

I'm posting here as there never seems to be replies in the suggestions section from rosetta.

Basically the current setup abuses bandwidth (as I would call it)

Compression of transfers
Has been mentioned before
An absolute necessity for all the large files.

Reuse of existing files
I have noticed that you remove files like
aa1ogw_09_05.200_v1_3.gz
aa1ogw_03_05.200_v1_3.gz
aa1hz6A09_05.200_v1_3.gz
'03' being ~ 0.5MB and '09' being 1.5 to 2MB
then instantly redown load them <baffled look>,
this not only costs time and money for a lot of us but must put extra pressure on yourselves.
It is pointless.

Leave the files there, either untill they are no longer needed (newer versions) or that query type has finished.
You manage to do it with the client and other files (dunbrak.. bbdep...).


Just remember, less trasfer load on your servers (and less bandwith (total, and instantanious) if that matter).
We get the jobs quicker as well.


Many people still you dial-up at 56K or less (~ half the UK I think)
Also broandband users can also have transfer caps, say 2GB, 10GB per month
OR the braodband user could be on a pay-as-you-go scheme where we pay per MB usage.

so please, please, think about people who pay at your end to do your work.

Team mauisun.org
ID: 7178 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7215 - Posted: 22 Dec 2005, 19:10:57 UTC - in response to Message 7178.  
Last modified: 22 Dec 2005, 19:16:38 UTC


Compression of transfers
Has been mentioned before
An absolute necessity for all the large files.


the .gz files are already compressed (.gz is the Linux equivalent of .zip in windows, stands for gnu-zipped, from the gnu library that Linux uses extensively.)

Compressing an already compressed file usually adds overhead instead of saving it.

Rather than applying the compressinon to the file transfer, where there are all sorts of things that can go wrong, the project programmers chose to zip them before putting them on the server, and build in the unzip to the app. From memory I think that the Megabyte files are all gnu-zipped

It's not the only way to do it, but it is a perfectly reasonable approach.


Reuse of existing files
I have noticed that you remove files like
aa1ogw_09_05.200_v1_3.gz
aa1ogw_03_05.200_v1_3.gz
aa1hz6A09_05.200_v1_3.gz
'03' being ~ 0.5MB and '09' being 1.5 to 2MB
then instantly redown load them <baffled look>,
...
Leave the files there, either untill they are no longer needed (newer versions) or that query type has finished.


This sounds like a good call FC.

These files look like could usefully be made 'sticky', to use the BOINC jargon.

The only reason I can think of not to do this would be if perhaps they are altered by the client and returned - then reset to the original state for the next WU.

ID: 7215 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 7274 - Posted: 22 Dec 2005, 22:31:15 UTC - in response to Message 7178.  


Leave the files there, either untill they are no longer needed (newer versions) or that query type has finished.
You manage to do it with the client and other files (dunbrak.. bbdep...).


Currently these files are Work Unit specific. Even though they have the same name, they do not always have identical contents. There still may be a way for us to organize our downloads so that if they are identical, they are not redownloaded.


ID: 7274 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 7382 - Posted: 23 Dec 2005, 17:30:44 UTC
Last modified: 23 Dec 2005, 17:34:37 UTC

I know .gz means it's a compression but they are not compressed very well (follow the link I gave )
[quote]
e.g.
aa1hz6a09_05.100_v1_3
.gz = 1.3 Mb
.rar = 0.6 Mb

rosetta_4.79.windows_intelx86
.exe = 4.6 Mb
.rar = 1.3 Mb

bbdep02.May.sortlib
.gz = 6.5 Mb
.rar = 3.8 Mb[quote]

I just check with the newley download 400_v_1_3's

aa1hz6A09_05.400_v1_3.gz = 2.70MB
(unzip, re-zip in winrar)
aa1hz6A09_05.400_v1_3.rar
That is 48% the original size

aa1mkyA09_05.400_v1_3.gz = 3.70MB
aa1mkyA09_05.400_v1_3.rar = 1.93MB
52% of original size[i]

So you could easily cut them but half,
If someone gould test say bzip2 or similar ?

Also the science app 4.81 could be compressed easily
[i]rosetta_4.81_windows_intelx86.exe 4.57Mb
.rar = 1.34MB (~30%)



By compressed in the transfers I meant it as you put it (not a sort of Y-ModemG type thing) i.e. have it compressed while its transfering as much as possible, then relax it at the computers if it needs to be in that state.
They're about 30% compressed at the moment (other than exe files), but seems you could easily hit 15%.

I see that the newer ....400_v1_3 they are hugh, 2 to 4MB per file, unfortunatly i'm going to have to bow out of rosetta for the time being until something is sorted, that is just to large for me to handle, but I should still have 2 part timers an maybe able to get another one up, but my 4 buggest crunchers have to fine something easier on the transfers for now, especially as you now point out* they are unit specific, I think a few of my team mates may start to bow out as well once they notice. I also think that's why TSC! Russia, 400+ crunchers from who where by far the biggest team at FaD, even bigger than all the none team people there, have not come here in abundence :(

Also you need to update your website
https://boinc.bakerlab.org/rosetta/rah_requirements.php
under internet connection.
It is not in any way suitable for dial-up connections and caution for capped/pay-as-you-go broadband connections.
You do not mention the ~5Mb download for a couple of hours works that it is now
at
Say 4MB for a job, a job lasts 4hrs, over a month that is around 700MB, with caps set at 2GB on many connections in the UK so we can afford broadband, that is a good chuck of the allocation (and probably a large cost on pay-as-you-go)

File sizes seem to be going up, job time seem to be quicker than 4hrs.



* I've currently been baby sitting by keeping these large files in a seperate folder then copying them back when it deletes and tries to transfer them again, boinc has never complained, neither has rosetta. But if you say they are different then I guess they must be, although you should really 'tie' them to the work unit with a designation or something and notice the file is wrong <whistles>.
Team mauisun.org
ID: 7382 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 7386 - Posted: 23 Dec 2005, 17:55:18 UTC

* I've currently been baby sitting by keeping these large files in a seperate folder then copying them back when it deletes and tries to transfer them again, boinc has never complained, neither has rosetta. But if you say they are different then I guess they must be, although you should really 'tie' them to the work unit with a designation or something and notice the file is wrong <whistles>.


They may or may not be different, which is why we had them tied to work units. What you are doing may or may not be okay. We clearly do need some kind of check for this, as it could seriously confuse our results for people to be swapping files in and out.

I will tell DK to modify the requirements webpage.

This is another element of being a new boinc project. Our code was developed for running on our own local clusters, and we are still learning about how to optimally reconfigure it for distributed computing. Your comments are very useful in this regard.

I know that a some work went into writing code so that rosetta can read and write gzipped files, I don't know how hard it would be swap in bzip2. Is rar opensource?
ID: 7386 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 7415 - Posted: 23 Dec 2005, 20:55:33 UTC

No worries and I hence why I'm giving the feedback.

[of course it's always annoying to have spent an hour downloading some jobs on one computer (which halt any other sort of web browsing as well) just to see them all error out like they have been the past few day and some I got while typing the last message. ---> You should run a quick check on the jobs that have errored out twicw and remove them all, my errored job has now gone to some one else and I was the 3rd person https://boinc.bakerlab.org/rosetta/result.php?resultid=4842026 ]

If I can get a box to my house it'll be crunching, but these are going to run till they are empty of jobs now. (the house is a building site hence I cannot take them all around.)

But restrictions and pay-per-MB are becoming more and more common over here :-( on broadband connections.
I'd just recommend any one that they should forget it if they use dial-up now.
Team mauisun.org
ID: 7415 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 7422 - Posted: 23 Dec 2005, 21:13:57 UTC

RAR I don't think it is
http://www.rarlab.com/ but they are apperently nice people :)

I used it because it was there in front of me, just used maximum compression which was quick, everything else set to default



7-ZIP / pZIP is thoug (GNU Library or Lesser General Public License (LGPL)
and has similar and sometimes better compression
http://www.7-zip.org/
http://p7zip.sourceforge.net/


I have no idea of bzip2 and it crossplatform & compression ability
Team mauisun.org
ID: 7422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 7424 - Posted: 23 Dec 2005, 21:22:52 UTC
Last modified: 23 Dec 2005, 21:50:35 UTC

If you are on a dial-up, I would definitely ask for "no new work" from Rosetta until the new year. I can see how redownloading these files every 30 seconds as jobs fail would be maddening. And gobble up your monthly download limits. Pretty ridiculous. [edit] I mean, "pretty ridiculous" how many problems we are causing.[/edit]

ID: 7424 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7427 - Posted: 23 Dec 2005, 21:53:56 UTC - in response to Message 7274.  


Leave the files there, either untill they are no longer needed (newer versions) or that query type has finished.
...


Currently these files are Work Unit specific. Even though they have the same name, they do not always have identical contents.


I'm worried by this.

If someone has more than one WU (and even with the standard 0.1 day cache most of my clients have one crunching and one Ready to Run) then they will have only one copy of the file.

Einstein's biggest software blunder ever was when two sets of WU had file names that overlapped -- in that case the file names differed but only betwen upper & lower case, so that they worked fine on Linux but Windows treated the two files as one. Embarrassingly for the devs, the new WU had been tested on Linux but not on WIndows and not in conjunction with the other series of WU!

As FC says, if the files are at all different they should have different names. If this has not already caused you problems it will one day - with thousands of boxes downloading WU sooner or later one of them will get two WU with incompatible expectations of the same filename.

River~~
ID: 7427 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7428 - Posted: 23 Dec 2005, 22:00:01 UTC - in response to Message 7382.  

... with caps set at 2GB on many connections in the UK so we can afford broadband ...


<off_topic/>

hey FC I hadn't realised you're in the UK too -- I'm based in Manchester whereabouts are you?

</off_topic>
R~~
ID: 7428 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 7429 - Posted: 23 Dec 2005, 22:05:32 UTC - in response to Message 7427.  



Currently these files are Work Unit specific. Even though they have the same name, they do not always have identical contents.


I'm worried by this.

If someone has more than one WU (and even with the standard 0.1 day cache most of my clients have one crunching and one Ready to Run) then they will have only one copy of the file.


Okay, I wasn't very precise. For the work units that we have been sending out, to the best of my knowledge, these files are identical. But there are situations where these files could have different contents even though they have the same name. But I see that this could cause problems with the boinc directory structure. I'll make sure that we resolve this before the time comes where we might be inclined to send out different files with the same name.

ID: 7429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile blackbird

Send message
Joined: 4 Nov 05
Posts: 15
Credit: 93,414
RAC: 0
Message 7518 - Posted: 24 Dec 2005, 13:34:46 UTC

Jack, i can repeat my suggestion . Delta integer compression with subsequent huffman integer compression stage can give very good results with coordinates. Deep knowledge of WU structure is required to make good compression, but even grouping reduces about 50% of file size.
ID: 7518 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 7543 - Posted: 24 Dec 2005, 19:23:29 UTC - in response to Message 7428.  

... with caps set at 2GB on many connections in the UK so we can afford broadband ...


<off_topic/>

hey FC I hadn't realised you're in the UK too -- I'm based in Manchester whereabouts are you?

</off_topic>
R~~


Tother end of the M62 to you, I'm living just out side Hull having gone back to University to do an MSc degree (physics, Laser & Micro-machining)
Team mauisun.org
ID: 7543 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7562 - Posted: 25 Dec 2005, 1:25:48 UTC - in response to Message 7543.  

Tother end of the M62 to you, I'm living just out side Hull having gone back to University to do an MSc degree (physics, Laser & Micro-machining)


shame that's too far to go for a pint. Post something in the cafe if you ever head this side of the Pennines, maybe get some crunchers to gather for a Rosetta night out.

merry xmas R~~
ID: 7562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile SwZ
Avatar

Send message
Joined: 1 Jan 06
Posts: 37
Credit: 169,775
RAC: 0
Message 8335 - Posted: 4 Jan 2006, 7:12:16 UTC

Great thank user Nothing But Idle Time!
I try search keyword "traffic" before create thread "Internet traffic and necessary data" but don't found it.
Sorry for stupidity!

Well, now reply some my sentencies in this thread.

We need to reduce transfering data per WU. Me not confuse from necessity continuously PC working all day and night, but very confuse traffic. About 50Mb per day - it too large traffic. Usualy I spend about 300Mb per month, but 50Mb/day*30day=1.5Gb, which cost about $150. This is half of my salary!!!
If we want to use "rent-free" PC resources in all world, we must understand, that "rent-free" in DC is CPU time, but not internet traffic. And reasonable traffic size about 1-2Mb/day/one CPU core.
Well, I want to say, that we can easy reduce traffic in 2-5 times. And emphazed problem, but not want complain to own hardships. So, thanks for suggestions go to other DC projects, but I don't want :-)

I try simple pack text file aa1dtjA09_05.400_v1_3 with fragments data
with RAR instead GZIP
and obtain archive size 1632Kb instead 3446Kb,
if using 7zip we have 1479 Mb instead 3446Kb.
May by use 7ZIP instead GZIP?

Next, if apply discrete numbers we easy can pack each line from
"1f06 A 162 V L -83.317 116.745 175.741
1f06 A 163 Q L -85.736 -44.654 174.056
1f06 A 164 K E -149.188 143.809 -179.095
..............
"
in six bytes, so after packing with RAR obtain about 800 Mb instead 3.4Mb.
(for example 3 bytes for protein name, chain and starting AA number for whole fragment -
about 3 bits per line, 5 bits for AA name, 2 bits for SS, after moving center of mass of fragment to origin
may by enough 10 bits per coordinate component (if fragment size is 10 Angstrom, discrete fault about 0.01 Angstrom, what more then enough),
so we have 3+5+2+3*10=40bit=5byte per line).
Good, I think, that this "easy" packing nobody not apply, but using 7ZIP instead GZIP is very simple change in software.

Next, 10 trajectories per protein is not limit. May be it possible to make this number variing by user, so that user can change it
and tune traffic and WU frequency. For example lets WU calculate 2 days instead 2 hours, then traffic reduce to 2 Mb per day ;-)

ID: 8335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile SwZ
Avatar

Send message
Joined: 1 Jan 06
Posts: 37
Credit: 169,775
RAC: 0
Message 8352 - Posted: 4 Jan 2006, 14:48:56 UTC - in response to Message 8335.  

Sorry! Not Mb, I mean Kb:

if using 7zip we have 1479 Kb instead 3446 Kb.

in six bytes, so after packing with RAR obtain about 800 Kb instead 3.4Mb.
ID: 8352 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 8359 - Posted: 4 Jan 2006, 18:34:17 UTC - in response to Message 8352.  
Last modified: 4 Jan 2006, 18:36:42 UTC

Sorry! Not Mb, I mean Kb:

if using 7zip we have 1479 Kb instead 3446 Kb.

in six bytes, so after packing with RAR obtain about 800 Kb instead 3.4Mb.


The main reason that programs like 7zip and rar are not used is that they are not avaialble for all platforms that BOINC runs on.

The fact that zlib is has a lot to do with why it was chosen.
ID: 8359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile SwZ
Avatar

Send message
Joined: 1 Jan 06
Posts: 37
Credit: 169,775
RAC: 0
Message 8361 - Posted: 4 Jan 2006, 18:45:21 UTC - in response to Message 8359.  


The main reason that programs like 7zip and rar are not used is that they are not avaialble for all platforms that BOINC runs on.

The fact that zlib is has a lot to do with why it was chosen.


Well, Ok. So we no change compression, but haw about control under number of trajectories, id est WU computational cost? It acceptably increase in 100 times, which bring to decrease effective traffic in 100 times :-)
ID: 8361 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 8402 - Posted: 5 Jan 2006, 5:42:54 UTC

http://www.programurl.com/software/rar1.htm
lists unrar source code for Win, Linux, and Mac.
http://www.programurl.com/unrar-sourcecode.htm

And if the Linux code can't be compiled with the Solaris compilers for that platform (and porting isn't an option) then you could maintain 2 sets of compressed files; one for Solaris, and one for the rest of us.

Better tracking of files to minimize the amount of downloads benefits even those of us with download caps that aren't being hit. The faster a download happens, the less chance something will go wrong. (I'm switching my home system from cable modem to dsl because of the twice daily disconnects.. :)


ID: 8402 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 8403 - Posted: 5 Jan 2006, 5:51:21 UTC

http://www.7-zip.org/download.html
and the source code for 7zip is available for win/linux/mac as well.


ID: 8403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Rosetta@home Science : Feedback, .. bandwidth usage :-(



©2024 University of Washington
https://www.bakerlab.org