mercredi 24 février 2021

Python: pipelining functions with multiple return/input values, or use OOP? Best Practices?

We have a data 'processing' function and a 'serializing' function. Currently the processor generates & returns 4 different data structures all from the same large data source input, and all outputs are related to each other.

Trying to separate out the 'data processing' from the 'serializing' step has gotten a bit.. messy. Looking for the best practise on what to do here.

def process(input):
   ...
   return a,b,c,d

def serialize(a,b,c):
   ...
   # Different serialization patterns for each of a-c.

a,b,c,d = process(input)
serialize(a,b,c)
go_on_to_do_other_things(d)

That feels janky. Should I instead use a class where a,b,c,d are member variables?

class VeryImportantDataProcessor:
   def processd(self,input):
      self.a = ...
      self.b = ...
      ...

   def serialize(self):
      s3.write(self.a)
      convoluted_serialize(self.b)
      ...

vipd = VeryImportantDataProcessor()
vipd.process(input)
vipd.serialize()

Keen to hear your thoughts on what is best here!

Note after processing and serializing, the code goes on to use variable d for further unrelated shenanigans, but a, b, c are all finished once they've been saved. Not sure if that changes anything.

Aucun commentaire:

Enregistrer un commentaire