I am developing a Spark-Scala application, in which I am planning to use Template Design Pattern.
Here is the proposed design.
ProjectTemplate.scala => This is a trait containing functions such as createSession, readData, processData, writeResult. Out of these 4, I am planning to give implementations for createSession, readData, writeResult in this trait, whereas the implementation for processData will be provided by the child class that implements this trait. I have tested this with giving println statements and the approach works.
So in all there are 3 components:
1) ProjectTemplate.scala =>
trait ProjectTemplate {
def createSession():sparkSession= {
... implementation is provided..
}
def readData(): Dataset[Row] = {
... implementation is provided..
}
def processData():Dataset[Row] //Implementation will be provided by child class
def writeResult(result: Dataset[Row], filePath: String):Boolean={
... implementation is provided..
}
def execute():Unit={
createSession()
readData()
processData()
writeResult()
}
}
2) Childclass.scala =>
class Childclass extens ProjectTemplate{
override def processData(...):Dataset[Row]={
<funciton implementatoin>
...
}
}
3) ChildObject.scala =>
object ChildObject extens App {
val obj = new Childclass
obj.execute()
}
Command that I will use to submit the Spark application.
spark-submit --class package.ChildObject --master yarn --deploy-mode cluster
My question is: Will this application successfully create sparksession and return it. Can we do it this way? Thanks.
Aucun commentaire:
Enregistrer un commentaire