Message boards : Number crunching : R@H Scientists/Coders: An analysis of the Rosetta binaries...
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,518,540 RAC: 9,764 |
I don't think so. Gpu computational power (if sw is ok) outclasses cpu This is off-topic but interesting. Isn't the benefit of GPGPU that the silicon is already there anyway, whereas AVXx might be better but isn't likely to make up much of a desktop processor's die area? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
We'll see how Knights Landing will perform with its AVX512 support and much more cores than a reglar DT CPU. Knights Landing is not a cpu.....and is not a gpu... :-) |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Knights Landing is not a cpu.....and is not a gpu... :-) It's heterogeneous computing vs. homogeneous computing. Knights Landing will not just only be available as a coprocessor but also as a host cpu capable of running an OS and applications on its own. Nobody really wants to rewrite his or her application to utilize CUDA or OpenCL. But obey some coding rules and compile it with the right flags, you'll get an instant speedup. Write clean code in your favorite programming language and let the compiler do the hard work for you. GPGPU suffers from all kinds of problems, e.g. latency, power consumption and the necessary rewrite of your application. David E K already showcased what is possible with minor optimizations and I figure rosetta@home is not necessarily written with speed in mind. I also think it's beneficial to the project when the developers discard the support for really ancient cpus and support the capabilities of reasonably new cpus. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
It's heterogeneous computing vs. homogeneous computing. Knights Landing will not just only be available as a coprocessor but also as a host cpu capable of running an OS and applications on its own. Nobody really wants to rewrite his or her application to utilize CUDA or OpenCL. I agree with you. But you're speaking about monster cpu/gpu/coprocessor. Xeon Phi 7120P costs over 4000 dollars (and have 1.2 Tflops of DP). Radeon 290X have 700Gflops in DP and costs 600 dollars. P.S. Recent version of OpenCl, for example, give the possibility to write code for cpu and "pass" it on gpu easely with Spir. I also think it's beneficial to the project when the developers discard the support for really ancient cpus and support the capabilities of reasonably new cpus. +1 I think we are OT :-P |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
Knights Landing is not a cpu.....and is not a gpu... :-) +1. GROMACS (used by Folding@Home) for example, is written in assembly by hand to squeeze every bit of performance. |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Hmmm, this rather -1. It's the exact opposite of what I am trying to convey here. Hands off assembly, hands off intrinsics, just heed some coding guidelines and let the compiler (designers) do the hard job. Updating the compiler infrastructure and providing us with an (whatever instruction enabled) 64 Bit build is the way to go. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
It as more of a +1 to approve your suggestion, then I added an example as how far some teams go in the name of speed. I wasn't suggesting R@H to code in assembly haha! |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
We're on the same page here :-) Definitely nice to see some optimizations in the pipeline... |
xdarma Send message Joined: 20 Jan 08 Posts: 5 Credit: 5,014,905 RAC: 1,196 |
If you are the developer/researcher, the question they ask is "How many systems are going to use this new feature and will it pay back the researcher effort for the port?" The Rosetta researchers have an idea about what the machine distribution looks like. I don't know if the number of AMD HSA APUs is sufficient to warrant the effort. This principle also applies in the case of AVX-512? IMO, I think there are much more APU on the market than AVX-512 enabled cpus. Even intel cpu own an integrated gpu. Not HSA-capable, but is however unused compute power. From wikipedia: The AVX instructions support both 128-bit and 256-bit SIMD. The 128-bit versions can be useful to improve old code without needing to widen the vectorization, and avoid the penalty of going from SSE to AVX, they are also faster on some early AMD implementations of AVX. This mode is sometimes known as AVX-128.[4] Maybe the best test is to use gcc with the option -mprefer-avx128. Don't know about icc. IIRC, gcc keeps an eye on portability, not on performance. So, using icc maybe hurts the ARM version of rosetta. For sure, using icc hurts all non-intel cpu. Just some random thoughts, indeed. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
If you are the developer/researcher, the question they ask is "How many systems are going to use this new feature and will it pay back the researcher effort for the port?" The Rosetta researchers have an idea about what the machine distribution looks like. I don't know if the number of AMD HSA APUs is sufficient to warrant the effort. AVX-512? AVX-512 will be in Xeon PHI to be released soon but it will not likely have many target machines running Rosetta@Home in the near future. That is why I suggested the ICC -ax option which will generate fat binaries with multiple CPU support. APU vs AVX-512? Which APU target would you use? Nvidia CUDA? AMD native? If you target OpenCL, you get both Nvidia and AMD GPU ..... and you get all Intel GPU too. OpenCL takes some substantial coding changes. ICC vs ARM version? ICC does not generate ARM code so if you want to generate an ARM Rosetta@Home target, you would use the ARM gcc compiler. gcc with the option -mprefer-avx128. Most of the developers adding to the gcc optimizations have @intel.com mail address. gcc is a good compiler and lags icc by (I would guess ... a year or so) in feature development. The option itself will tell the compiler to use the XMM registers AND if the compiler cannot vectorize the code, it can be just as fast as the 256 or 512 bit options since ..... you are doing 1 operation at a time. The developer must insure that the code parallelism is recognizable to the complier. Many times poor coding practices introduce ambiguities that prevent the generation of vector code. It is VERY tough to say that binary "B" is XX% faster than binary "A" because it depends on where the program bottlenecks are. An Intel Wolfdale ( http://ark.intel.com/products/codename/24736/Wolfdale?q=wolfdale#@Desktop ) will behave much different than any CPU that followed it. Nehalem CPU an beyond had dramatic improvements in the cache subsystems which many times moved the bottleneck to different areas of the program. David's performance increases are probably different than what I would see on my Haswell Intel Core i7 5930K with DDR4 memory. Future Intel CPU are going to increase memory bandwidth and programs will different % of performance increase going from application version to version. 2011 Sandy Bridge era AVX1 complier presentation. It talks about the icc v12 compiler and I am currently beta testing the v16. https://indico.cern.ch/event/125167/material/slides/0.pdf It is a always a very fun puzzle to figure out how to structure the code so the compiler can generate vector code. |
xdarma Send message Joined: 20 Jan 08 Posts: 5 Credit: 5,014,905 RAC: 1,196 |
AVX-512 I was wrong: I did not mean AVX-512, but AVX2. APU vs AVX-512 The gpu client is a well-know desire of the rosetta crunchers. IIRC, developers have tested an OpenCL client few years ago but did not fit the needs. And I can't compare clients that doesn't exist. As a side note: I don't think nvidia can sell cpu or apu without paying royalties to intel or amd. ICC for ARM Thank you to confirm this. Most of the developers adding to the gcc optimizations have @intel.com mail address. gcc is a good compiler and lags icc by (I would guess ... a year or so) in feature development. For sure, if intel want gcc supports its cpus, it must contribute ;-) I do not think gcc "lags behind", but follow a different path. For example: supporting the ARM architecture. The option itself will tell the compiler to use the XMM registers AND if the compiler cannot vectorize the code, it can be just as fast as the 256 or 512 bit options since ..... you are doing 1 operation at a time. The developer must insure that the code parallelism is recognizable to the complier. Many times poor coding practices introduce ambiguities that prevent the generation of vector code. So, you agree with me? -mprefer-avx128 worth a test? It is VERY tough to say that binary "B" is XX% faster than binary "A" because it depends on where the program bottlenecks are. Thank you for informations, but I'm no longer interested on buying intel cpus. Due to unfair competition. Sorry. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
06/24 David wrote: I'll push it out to ralph soon 07/09 Any news?? |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Would anyone want to or would know someone who would want to help optimize the Rosetta software, at the compiler level or even at the code level? It is freely available through an Academic license but we can also provide it to individuals under the same license agreement. This offers a great opportunity for you all to contribute directly and have direct positive impact for all researchers who use Rosetta and the optimizations would carry over to Rosetta@home. David K |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,518,540 RAC: 9,764 |
This is great news :D I can't help with the optimisation but I have a spare PC that I'm happy to set up teamviewer on if someone wants to use it to run copies of Rosetta on to speed up testing. D |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
Would anyone want to or would know someone who would want to help optimize the Rosetta software, at the compiler level or even at the code level? It is freely available through an Academic license but we can also provide it to individuals under the same license agreement. I have looked at the Rosetta license a couple times but I am not associated with any educational institution and did not feel I qualified to download under the Academic license. I would be interested in looking at optimizations under the direction of someone on the project, share any findings with them so they could verify and incorporate. I would require some guidance on how to build and validate the results. It would be easier for me to work in Linux Fedora21 environment but I can build a VirtualBox of any Linux .... or finally tackle a Windows VS version. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Would anyone want to or would know someone who would want to help optimize the Rosetta software, at the compiler level or even at the code level? It is freely available through an Academic license but we can also provide it to individuals under the same license agreement. That's great! I'll get back to you about how you can get the source. We should be able to work something out regarding the license. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
Would anyone want to or would know someone who would want to help optimize the Rosetta software, at the compiler level or even at the code level? It is freely available through an Academic license but we can also provide it to individuals under the same license agreement. As long as you don't run off making billions of dollars thru the Rosetta software... I think you'd be good regarding your qualifications to download the code. As a side note, I'm glad all of this discussion is turning out well. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
It would be easier for me to work in Linux Fedora21 environment but I can build a VirtualBox of any Linux .... or finally tackle a Windows VS version. VirtualBox 5.0 released: - Make more instruction set extensions available to the guest when running with hardware-assisted virtualization and nested paging. Among others this includes: SSE 4.1, SSE4.2, AVX, AVX-2, AES-NI, POPCNT, RDRAND and RDSEED |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
It would be easier for me to work in Linux Fedora21 environment but I can build a VirtualBox of any Linux .... or finally tackle a Windows VS version. That VirtualBox change is very nice. Thanks for pointing it out. It will be interesting to see what you have to do to change the instruction set. I would expect it to be shutdown the guest OS and change configuration so it BOOTS again as a different CHIP. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
|
Message boards :
Number crunching :
R@H Scientists/Coders: An analysis of the Rosetta binaries...
©2024 University of Washington
https://www.bakerlab.org