Sunday, November 10, 2019

SPO600 SIMD Lab4 Pt2

Part2
What is SIMD?

Stands for Single Instruction Multiple Data, it allows parallel processing of the same operation for data. There are different sizes of registers for SIMD from 128 bits to 2048 bits and we're able to split these bits into different lanes.

We'll use a 128 bit register as an example, these are the different types of lane configurations available:

  • (2) 64 bit lanes
  • (4) 32 bit lanes
  • (8) 16 bit lanes
  • (16) 8 bit lanes

Take this for loop as an example:

int ttl = 0;
  for (int x = 0; x < 100; x++){
   ttl += 5;
}

We could technically speed this up by having the compiler auto-vectorize the loop because the operation is not dependent on the value of ttl as we're just adding 5 to it regardless. So we can technically run this loop in parallel with SIMD. Due to the max value of 8 bits being 255 and the value of ttl will most likely be bigger than that, the max number of lanes we should be splitting a register into is 8 so we can accommodate values up to 65536.

Even though we do split the lanes, sometimes the compiler does not auto-vectorize code:

  1. the overhead cost of vectorization outweighs the benefit of vectorization
  2. loops must not overlap in a way that is affected by vectorization

The first point is pretty self explanatory.
The second point - if the operation being carried out in the loop was dependent on the value of its previous iteration, the loop will fail to auto-vectorize.

As part of the SIMD lab we had to change code so another loop could be auto-vectorized. We were provided the following code for the third for loop of the SIMD lab

        for (x = 0; x < SAMPLES; x+=2) {
          ttl = (ttl + data[x])%1000;
        }

In an attempt to auto-vectorize the loop for the SIMD lab, I made the following code change:

        for (x = 0; x < SAMPLES; x+=2) {
          ttl += data[x]%1000;
        }

The loop vectorized, the math is a little off and I wasn't able to get the same answer as the implementation of the non-vectorized one.

No comments:

Post a Comment

Contains Duplicate (Leetcode)

I wrote a post  roughly 2/3 years ago regarding data structures and algorithms. I thought I'd follow up with some questions I'd come...