Processing 2.1 Mio Records from Solr in a Spark Cluster with BeakerX¶

I decided to build a repository of news headlines: I loaded all ‘New York Times’ headlines since the year 2000 and all Business related news from the ‘Guardian’ into the Solr Search engine. It has never been the intention to process all documents in one run but the goal was to search for the relevant articles with the help of the search engine and then process only the relevant headlines.

Out of curiosity however, I investigated the performance of different alternatives to access all the data. The result was documented in this Blog.

In this instalment I am looking at the performance of a Spark Cluster which is running as Docker Service: The details can be found in the following Gist.

Processing 2.1 Mio Records from Solr in a Spark Cluster with BeakerX¶

Published by pschatzmann on 18. December 201818. December 2018

0 Comments

Leave a Reply Cancel reply

An Arduino C++ Emulator (for Jupyter)

Jupyterlab and C++

Is Trump good for the Economy ? – The Evolution of Company Profits and R&D

Processing 2.1 Mio Records from Solr in a Spark Cluster with BeakerX¶

Published by pschatzmann on 18. December 201818. December 2018

see also:

0 Comments

Leave a Reply Cancel reply

Related Posts

An Arduino C++ Emulator (for Jupyter)

Jupyterlab and C++

Is Trump good for the Economy ? – The Evolution of Company Profits and R&D