vendredi 9 novembre 2018

Hadoop Mapreduce Flow Custom InputFormat, RecordReader

I'm new to Hadoop and currently I'm learning mapreduce design pattern from Donald Miner & Adam Shook MapReduce Design Pattern book. So in this book there is Cartesian Product Join Pattern and it create a custom InputFormat and RecordReader and it makes me confuse about the flow. I have many question about the code :

  1. Where is InputFormat and RecordReader code actually run? Is it in AppMaster for InputFormat and NodeManager for RecordReader?
  2. Why is there another InputFormat instance in the CartesianRecordReader? In all MapReduce flow that I found, InputFormat is running before record reader does.
  3. Which function is called automatically since there is so many function that seems not to be called? Just like CartesianInputFormat.getSplits.
  4. Why does he used ReflectionUtils.newInstance to create new InputFormat instance? Can we do it like new InputFormat instead?
  5. Is there a tutorial that teach about customing InputFormat, RecordReader, InputSplits, etc? All I found is a normal mapreduce tutorial like wordcount that only use mapper and reducer.

Here is the source code https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java

That's all, thanks in advance :)

Aucun commentaire:

Enregistrer un commentaire