c - Is SIMD Worth It? Is there a better option? -


I have some code that runs very well, but I would like to run it better, I have a big problem with it That should be nested for the loop. The exterior is for a iteration (which must be sequentially), and under the internal one thought each point is for the particle. I know that I can not do a lot about the external, but I am thinking that there is a way to do something like this:

  zero bump (particle particles [], box boxes [ ], Double boxshiftx, double boxshift) {/ * {{{* * / int i; Double nx; Double NY Int BoxNum; (I = 0; i & lt; PART_COUNT; i ++) {Buxenam = (((int (particles [i] .xx + boxshift x) / / BOX_SIZE% BWIDTH + BWIDTH * ((((int () ( Particle) .sy + boxShiftY)) / BOX_SIZE)% BHEIGHT); // duplicate and macro is pasted, this is the reason why it looks a bit awkward particles [ii] .xx - = box [boxnum] .mx; particle [i ] .vY - = box [boxname] .mA; if (box [boxname]. Rotator == 1) {nx = particle [ii] vx * wxa particles [ii] vi * wcsa; ny = particle [i] .vX * Wax + particle [i] .vY * Wyy;} Else {// to randomly choose a rot. Direction NX VAX * particle [ii] .vx-particle [i] .vY * wxy; ny = - plaques [ii] vx * wax + particle [i] .vY * wyy;} particle [i] .vx = nx + Box [boxname] .mx; particle [i] .vY = nY + box [boxname]. ME;}} / *}}} * /  

I have seen CID , Although I have not been able to find more information about this, and I am not completely sure that the processing is absolutely necessary, the value of more than half of the instructions will be worth the price, because apparently The two pairs can be used at one time.

I have it shm and pthread_barrier in multiple threads (it synchronizes different steps, the code above is one), but it has slowed down.

My current code goes very quickly; It is in the order of each other for every 10 M particle * iterations, and what can I say from the beginning, 30% of my time alone is spent in that function (5000 calls; PART_COUNT = 8192 particles take 1.8 seconds ). I am not worried about small, continuous timing things, it has only taken more than a week more than 512 particles * 1000 experiments of 50 iterations.

I think my question is whether there is any way to do any work with these long vectors which are more efficient than them, just looping, I think it should be, but I It's not getting it.

I'm not sure how much CIDD will benefit; The inner loop is very small and simple, so I think (by just seeing) that you probably have more than a few memories, with this in mind, I try to rewrite the main part of the loop so that the particles do not have much Could touch

  const double temp_vx = particle [i] .vx - box [boxnum] .mx; Const Double temp_vY = Particle [i] .vY - Boxes [BoxName] .MA; If (boxes [boxnom] .rotdiere == 1) {nX = temp_vX * Wxx + temp_vY * Wxy; NY = temp_vX * Wyx + temp_vY * Wyy; } Else {// it randomly to choose a rot. Direction nx = temp_vx * wxx-temp_vY * wxy; NYY = -temp_vX * Wyx + temp_vY * Wyy; } Particle [i] .vx = nx; Particle [i]. VY = nY;  

There are small potential side effects of not doing extra spare at its end.


Use another potential motion to __ restrict on the particle array, so that the compiler writes better that writes velocity. In addition, if Wxx etc. are global variables, they may be reloaded each time, instead of being stored in registers; It will also help with using the __bestricted .


Since you are reaching the particles, you can try prefetching (like __ builtin_prefetch ) GCC) to reduce the lack of cache, something further Particle. Prefetching on boxes is a bit difficult because you are accessing them in an unexpected order; You can try something

  int nextBoxnum = (((int (particle [i + 1] .x + box shiftx) //// etc ... // prefetch boxes [nextBoxnum ]  

I saw one last time - if Box: Rotdir is always +/- 1.0, then you can compare this internal loop and end the branch. :

  const double rot = boxes [boxnum] .rotdir; // always +/- 1.0 nx = particle [i] .vx * wxx + rot * particle [i] .vY * wxy ; Ny = rot * particle [i] .vx * wax + particle [i] .vY * wyy;  

Naturally warnings of profiles before and after An. But I think it can help all of them, and what you do not switch on or SIMD.


Comments