jeudi 21 mars 2019

Spark-Scala application in Template Design Pattern

I am developing a Spark-Scala application, in which I am planning to use Template Design Pattern.

Here is the proposed design.

ProjectTemplate.scala => This is a trait containing functions such as createSession, readData, processData, writeResult. Out of these 4, I am planning to give implementations for createSession, readData, writeResult in this trait, whereas the implementation for processData will be provided by the child class that implements this trait. I have tested this with giving println statements and the approach works.

So in all there are 3 components:

1) ProjectTemplate.scala =>

    trait ProjectTemplate {

        def createSession():sparkSession= {
             ... implementation is provided..
        }

        def readData(): Dataset[Row] = {
             ... implementation is provided..
        }

        def processData():Dataset[Row]  //Implementation will be provided by child class

        def writeResult(result: Dataset[Row], filePath: String):Boolean={
             ... implementation is provided..       
        }

        def execute():Unit={
            createSession()
            readData()
            processData()
            writeResult()
        }
    }


2) Childclass.scala =>
    class Childclass extens ProjectTemplate{
        override def processData(...):Dataset[Row]={
            <funciton implementatoin>
            ...
        }
    }


3) ChildObject.scala =>
    object ChildObject extens App {
        val obj = new Childclass
        obj.execute()
    }

Command that I will use to submit the Spark application.

spark-submit --class package.ChildObject --master yarn --deploy-mode cluster

My question is: Will this application successfully create sparksession and return it. Can we do it this way? Thanks.

Aucun commentaire:

Enregistrer un commentaire