Message boards : Number crunching : Some minirosetta 3.65 perf data
Author | Message |
---|---|
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
The 64-bit Linux app is dynamically linked. The first time I have seen that. Dynamic linking causes the libglut type library problems BUT it also allows the Linux system to select the optimized libraries. On my Broadwell system, Rosetta uses math libraries (libm.so) that have an AVX optimized that it takes. Rosetta is still scalar code so Rosetta is (by necessity or by over sight) using 1/2 of the SSE registers. The main binary is still built with the standard scripts that 'strip" symbols out so it is tough to drill down into the code hot spots. 2.73% libc-2.21.so [.] _int_malloc 2.40% libc-2.21.so [.] free 1.86% libc-2.21.so [.] malloc_consolidate 1.83% libc-2.21.so [.] malloc 1.56% libstdc++.so.6.0.21 [.] std::_Rb_tree_increment 1.07% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x00000000036cca02 0.97% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x000000000019df08 0.97% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002b435e2 0.88% libm-2.21.so [.] __ieee754_log_avx 0.87% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x000000000000feb2 0.86% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x00000000036ce047 0.83% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002b58450 0.77% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x000000000312f1b9 0.76% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002bf055d 0.73% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002bf0231 0.68% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002bf022d 0.65% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002b5845e 0.61% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002bf0e66 0.61% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002b55f05 0.60% libm-2.21.so [.] __ieee754_exp_avx 0.59% libm-2.21.so [.] __sin_avx 0.58% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x0000000002b56004 0.56% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x000000000312b489 0.56% minirosetta_3.65_x86_64-pc-linux-gnu [.] 0x000000000367b144 |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,518,540 RAC: 9,764 |
Do you think that is likely to be a low hanging fruit for getting some easy and reliable speed gains? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
Do you think that is likely to be a low hanging fruit for getting some easy and reliable speed gains? I suspect so. I am glad to see the dynamic linking with libm and don't view the missing GL library as a big problem. I suspect that looking at why the code does not vectorize is the change that could make most impact and stay within the SSEx envelop. I have never seen "malloc_consolidate" come to the top. From what I have read so far, it seems to be trying to combine adjacent memory blocks that were freed. At first glance, it appears that the code was designed to avoid static buffers malloc/free .... trying to keep the total amount of memory used down. Seems like an attempt at garbage collection. 1. any recovery of time from those functions will allow the others to work. If you remove 2%, the remaining 98% now has 100% of the time. I would suspect ~5% if the memory management could be thought through. 2. If you free and then allocate memory, there is a good chance that the physical memory locations will not be cached in that CPU cache. You also generate a string of read/write misses. There does not seem to be any hot spot that would make much difference .... but it is tough to know when much of the binary has symbols stripped. The only reason I can see libc is because I loaded the debug info. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
Rosetta is still scalar code so Rosetta is (by necessity or by over sight) using 1/2 of the SSE registers. David says: I've been too busy to look into optimizations. We do have one volunteer helping us out however. Are you the volunteer? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
Rosetta is still scalar code so Rosetta is (by necessity or by over sight) using 1/2 of the SSE registers. I have not been contacted, but I did volunteer on the board and in a private message. I answer questions and am happy to consult that way too. I am still poking around looking for a project to adopt me. 8-) The execution profile is "flat" (not much execution time in one function) which means there is probably nothing trivial to "fix" but it is hard to know without source OR a binary with symbols/debug records. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
I have not been contacted, but I did volunteer on the board and in a private message. I answer questions and am happy to consult that way too. I am still poking around looking for a project to adopt me. 8-) Uh, there are a lot of projects out there that need a good developer... |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,994,271 RAC: 9,725 |
I have not been contacted, but I did volunteer on the board and in a private message. I answer questions and am happy to consult that way too. I am still poking around looking for a project to adopt me. 8-) A "lot of projects out there NEED a good developer". Far fewer projects "WANT a good developer" to look over their shoulder. 8-) That is just human nature. I am more a prototype/breadboard/algorithm engineer. I encourage developers who get my algorithms/code to use it as a model and rewrite it in their own program vernacular. It is somewhat embarrassing that most just leave the code untouched. Ugh! My junk gets documented forever. A current example project of mine would be converting AVX2 algorithms to use AVX512 instructions. For Rosetta, the first thing I would probably examine would be the reasons stopping vectorizing (probably a 30%-40% difference). I think I am going to build a Virtualbox and use the isolated environment and start with the BOINC client/manager source to understand the underlying system. I would only work under the supervision/direction of a project developer AND only for as long as they have interest. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,518,540 RAC: 9,764 |
I have not been contacted, but I did volunteer on the board and in a private message. I answer questions and am happy to consult that way too. I am still poking around looking for a project to adopt me. 8-) Hopefully someone will be in touch! |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
A "lot of projects out there NEED a good developer". I'm not so pessimist. A lot of projects publish their source codes for free. Denis, Poem, CSG.... |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
A "lot of projects out there NEED a good developer". Rosetta is a bit different. Their code is free-to-use-sort-of depending on whether you are doing for profit or for academic purposes. rjs5 should be able to look at the code freely if he asked he isn't going for profits. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
Rosetta is a bit different. Their code is free-to-use-sort-of depending on whether you are doing for profit or for academic purposes. I know the "policy" about rosetta sw. I show some projects with open sw licence, indeed. I don't know if rosetta's admins want to be helped and how and how much |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
Rosetta is a bit different. Their code is free-to-use-sort-of depending on whether you are doing for profit or for academic purposes. Yeah... that's the thing. In addition, R@H seems to be an "extension" to the real Rosetta Code (Rosetta Commons). At least now we have a 64-bit linux binary. |
Message boards :
Number crunching :
Some minirosetta 3.65 perf data
©2024 University of Washington
https://www.bakerlab.org