design-patterns: What is the best method to minimize code complexity when saving stack data to exploit parallelism?

lundi 17 juillet 2017

What is the best method to minimize code complexity when saving stack data to exploit parallelism?

I am attempting to accelerate some code with CUDA, and am under the constraints of preserving code readability/maintainability as much as possible.

I have found and parallelized a function buried within several functions/loops. This function accounts for ~98% of processing time, but doesn't exploit enough parallelism alone to be useful (on the order of a couple blocks..). When executed simultaneously the code is much faster. However, as a result I am forced to maintain a big list of stack objects that I must iterate over several times, see the code below:

void do_work(int i, ...) {
    // computationally expensive stuff...
}

void prereq_stuff(int i) {

    // lots of big divergent control structures...

    do_work(i); // maybe arrive here..

    // output and what not....
}

int main() {

    for (int i = 0; i < BIG_NUMBER; i++) {
        prereq_stuff(i);
    }

    return 0;
}

Has turned into...

// a struct that contains all the stack data..
struct Stack {
    int foo;
    double bar;
};

void do_work_on_gpu(List<Stack> contexts) {
    // launch a kernel to handle to expensive stuff..
}

void prereq_stuff(Stack* context, int i) {
    // maybe queue up data for do_work_on_gpu()...
}

void cleanup_stuff(Stack* context, int i) {
    // output and what not...
}

int main() {

   List<Stack> contexts; // some container of stack objects

   for (int i = 0; i < BIG_NUMBER; i++) {
        Stack* context = contexts.add();
        prereq_stuff(context, i);
    }

    do_work_on_gpu(contexts); // calls the CUDA kernel

    for (int i = 0; i < contexts.size(); i++) {
        cleanup_stuff(context, i);
    }

    return 0;
}

Is there some sort of design construct/pattern I can utilize here? Or is this as simple as it can get with having all the data to call do_work() available simultaneously?

Thanks!

design-patterns

lundi 17 juillet 2017

What is the best method to minimize code complexity when saving stack data to exploit parallelism?

Aucun commentaire:

Enregistrer un commentaire