Processing 2.1 Mio Records from Solr in Scala

I decided to build a repository of news headlines: I loaded all ‘New York Times’ headlines since the year 2000 and all Business related News from the ‘Guardian’ into a Sorl Search engine. More details can be found in my prior blog.

It has never been the intention to process all documents in one run but the goal was to search for the relevant articles with the help of the search engine and then process only the relevant headlines.

Out of curiosity however, I investigated the performance of different alternatives to access all the data.
In the examples that you can find below, I just try to count all entries!

I was looking at the following alternatives:

Processing using pure Scala
Processing with Spark

I don’t have a clustered environment and everything is containerised in Docker on a simple Intel NUC with an Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz, 4 cores.

The result can be found in this Gist.

Processing 2.1 Mio Records from Solr in Scala

Published by pschatzmann on 12. December 201812. December 2018

0 Comments

Leave a Reply Cancel reply

Analyzing ESP32 Exceptions

Ranting about McBooks

Unstable Zerotier Connections…My WordPress is down!

Processing 2.1 Mio Records from Solr in Scala

Published by pschatzmann on 12. December 201812. December 2018

see also:

0 Comments

Leave a Reply Cancel reply

Related Posts

Analyzing ESP32 Exceptions

Ranting about McBooks

Unstable Zerotier Connections…My WordPress is down!