I use Spark DataFrameReader to perform sql query from database. For each query performed the SparkSession is required. What I would like to do is: for each of JavaPairRDDs perform map, which would invoke sql query with parameters from this RDD. This means that I need to pass SparkSession in each lambda, which seems to be bad design. What is common approach in such problems?
It could look like:
roots.map(r -> DBLoader.getData(sparkSession, r._1));
How I load data now:
JavaRDD<Row> javaRDD = sparkSession.read().format("jdbc")
.options(options)
.load()
.javaRDD();
Aucun commentaire:
Enregistrer un commentaire