Processing 2.1 Mio Records from Solr in Scala

I decided to build a repository of news headlines: I loaded all ‘New York Times’ headlines since the year 2000 and all Business related News from the ‘Guardian’ into a Sorl Search engine. More details can be found in my prior blog. It has never been the intention to process all documents in one run but the goal was to search for the relevant articles with the help of the search engine and then process Read more…

Increasing the Solr Heap in Docker

Initially I used the Solr standard settings but with a big amout of data, I was running out of heap space. Fortunatly it is possible to define the heap space with the SOLR_HEAP environment variable. If you also don’t want to risk to loose your data you should also map the volume /opt/solr/server/solr/mycores Here is the docker-compose.yml file that I am using: version: ‘3.1’ services: solr: image: “solr:alpine” ports: – “8983:8983” volumes: – /srv/solr:/opt/solr/server/solr/mycores environment: Read more…

News-Digest: Accessing the History of News Headlines¶

Recently I have spent some time to investigate the options to access the history of news articles via an API. I was mainly interested in APIs which can be accessed free of charge. Here is the list of the most useful providers: Guardian – Easy API – Acceptable Rate Limits – Access to over 1,900,000 pieces of content – Free for non-commercial usage New York Times – Provides API to search and separate API to Read more…

DL4J Doc2Vec – Sentiment Analysis using Sentiment140

I am planning to use the DL4J Doc2Vec implementation for a sentiment analysis. However, I don’t want to start with an empty network but the staring point should be a pre-trained network: The initial trining should be done with the Sentiment140 dataset which can be found at https://www.kaggle.com/kazanova/sentiment140. It contains 1,600,000 tweets extracted using the twitter api. In this Gist I describe how to train and save a DL4J Doc2Vec. The serialized model is available Read more…

DL4J – Sentiment Analysis with SentiWordNet¶

The basic goal of a ‘sentiment analysis’ is to classify a given text into positive, negative or neutral. SentiWordNet is a lexical resource for opinion mining. It assigns sentiments to each synset of WordNet which makes it possible to “calculate” an overall sentiment for a text. A SentiWordNet implementation can be found in DL4J in the deeplearning4j-nlp-uima artifact. This demo has been implemented in Scala using a Jupyter BeakerX Notebook.

Predicting the Direction of Stock Market Prices using a Random Forest Classifier

In this demo I will show how to predict if the closing price of Apple, General Electric and Samsung Electronics is moving up or down. We do this with the help of a Random Forest Classifier. I tried to replicate the result from a research paper authored by Luckyson Khaidem, Snehanshu Saha, Sudeepa Roy Dey which is describing the following concept: Data Preprocessing exponentially smoothing Features Relative Strength Index Stochastic Oscillator Williams %R Moving Average Convergence Read more…

Investor – Stock Forecasting with LSTM

In this blog we show how to forecast the closing price of AAPL using a LSTM RRN network. We use the open, closing, high and low rates and the volume of the current day as input in order to predict the subsequent closing price. This demo has been implemented in Scala using Jupyter with the BeakerX kernel using the following libraries – Investor – DL4j The details can be found in the following Gist.

Will you get rich by investing in Funds ?

In recent years most banks are pushing their customers to invest in Funds with the argument that this is more profitable than any other option. As always it is valid to question this statement. As a compelling argument an index chart is presented which shows indeed that the stock market was growing! I tried to run some simulations using the data form my Investor framework to validate this statement:  Here is the link to my Gist Read more…