mardi 30 octobre 2018

How to make sure to avoid branch misprediction when calling a method using one of two methods based on a boolean

Let's say you have a call to a method that calculates a value and returns it :

double calculate(const double& someArg);

You implement another calculate method that has the same profile as the first one, but works differently :

double calculate2(const double& someArg);

You want to be able to switch from one to the other based on a boolean setting, so you end up with something like this :

double calculate(const double& someArg)
{
  if (useFirstVersion) // <-- this is a boolean
    return calculate1(someArg); // actual first implementation
  else
    return calculate2(someArg); // second implementation
}

The boolean might change during runtime but it is quite rare.

I notice a small but noticeable performance hit that I suppose is due to either branch misprediction or cache unfriendly code.

How to optimize it to get the best runtime performances ?


My thoughts and attempts on this issue :

I tried using a pointer to function to make sure to avoid branch mispredictions :

The idea was when the boolean changes, I update the pointer to function. This way, there is no if/else, we use the pointer directly :

The pointer is defined like this :

double (ClassWeAreIn::*pCalculate)(const double& someArg) const;

... and the new calculate method becomes like this :

double calculate(const double& someArg)
{
  (this->*(pCalculate))(someArg);
}

I tried using it in combination with __forceinline and it did make a difference (which I am unsure if that should be expected as the compiler should have done it already ?). Without __forceline it was the worst regarding performances, and with __forceinline, it seemed to be much better.

I thought of making calculate a virtual method with two overrides but I read that virtual methods are not a good way to optimize code as we still have to find the right method to call at runtime. I did not try it though.

However, whichever modifications I did, I never seemed to be able to restore the original performances (maybe it is not possible ?). Is there a design pattern to deal with this in the most optimal way (and possibly the cleaner/easier to maintain the better) ?

Aucun commentaire:

Enregistrer un commentaire