I'm new to Hadoop and currently I'm learning mapreduce design pattern from Donald Miner & Adam Shook MapReduce Design Pattern book. So in this book there is Cartesian Product Join Pattern and it create a custom InputFormat and RecordReader and it makes me confuse about the flow. I have many question about the code :
- Where is InputFormat and RecordReader code actually run? Is it in AppMaster for InputFormat and NodeManager for RecordReader?
- Why is there another InputFormat instance in the CartesianRecordReader? In all MapReduce flow that I found, InputFormat is running before record reader does.
- Which function is called automatically since there is so many function that seems not to be called? Just like
CartesianInputFormat.getSplits
. - Why does he used
ReflectionUtils.newInstance
to create new InputFormat instance? Can we do it likenew InputFormat
instead? - Is there a tutorial that teach about customing InputFormat, RecordReader, InputSplits, etc? All I found is a normal mapreduce tutorial like wordcount that only use mapper and reducer.
Here is the source code https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java
That's all, thanks in advance :)
Aucun commentaire:
Enregistrer un commentaire