lundi 9 juillet 2018

Is it right to access external cache in apache spark applications?

We have many micro-services(java) and data is being written to hazelcast cache for better performance. Now the same data needs to be made available to Spark application for data analysis. I am not sure If this is right design approach to access external cache in apache spark. I cannot make database calls to get the data as there will be many database hits which might affect micro-services(currently we dont have http caching).

I thought about pushing the latest data into Kafka and read the same in spark. However, data(each message) might be big(> 1 MB sometimes) which is not right.

If its ok to use external cache in apache spark, is it better to use hazelcast client or to read Hazelcast cached data over rest service ?

Also, please let me know If there are any other recommended way of sharing data between Apache Spark and micro-services

Please let me know your thoughts. Thanks in advance.

Aucun commentaire:

Enregistrer un commentaire