Phil Schatzmann

Smart EDGAR: Calculation of Growth Parameters

Financial KPIs can be used to drive investment decisions. So it was my goal to create a comprehensive set of KPIs across different dimensions that are based on the information which can be determined from EDGAR: Profitibility Liquidity Efficiency Innovation Growth Leadereship Surprises In this document we demonstrate the approach on how to calculate the Growth Parameters

By pschatzmann, 7 years5. January 2019 ago

Data Science

Smart EDGAR File API

So far we have seen how to use the database related API of Smart EDGAR. In this Gist I give a quick overview of the core functionality of the File based API which does not require any DBMS.

By pschatzmann, 7 years3. January 2019 ago

Data Science

Calculating the Market Share of US Companies with the Help of EDGAR

Edgar is classifying the reporting companies by SIC (Standard Industrial Classification) Code. We can use this information to calculate the total sales per sector and then calculate the % share of the individual company. This is helping us to identify the companies with a big market share. The result (which uses Spark) can be found in the following Gist. Alternatively here is a version which purely relies on Scala and Smart EDGAR

By pschatzmann, 7 years28. December 2018 ago

Data Science

Smile: Predicting the Direction of Stock Market Prices using a Random Forrest Classifier

In this demo we show how to forecast if the NASDAQ-100 is moving up or down. We do this with the help of a Random Forrest Classifier from the Smile Machine Learning Framework. I tried to replicate the result from a research paper authored by Luckyson Khaidem, Snehanshu Saha, Sudeepa Roy Dey: Data Preprocessing exponentially smoothing Features Relative Strength Index Stochastic Oscillator Williams %R Moving Average Convergence Divergence Price Rate of Change On Balance Volume Read more

By pschatzmann, 7 years27. December 2018 ago

Data Science

Smart-EDGAR is supporting Formulas now…

In the last Blog we demonstrated how we can calculate KPIs with the help of Spark. We have extended Smart Edgar functionality so that we can implement Calculated KPIs directly with the help of formulas. Here is a short demo in Scala which uses formulas the built in ‘coalesce’ method the built in ‘lag’ method the built in ‘percentChange’ method and displays the result in a BeakerX Jupyter Notebook as tables and charts using the Read more

By pschatzmann, 7 years26. December 2018 ago

EDGAR

Calculating Financial KPIs with Scala, Spark and Smart-EDGAR¶

I am planning to use the Edgar data to determine and calculate some financial KPIs and feed these into a Neural Network. In my prior posts I described how to use Webservices to request and display Edgar information with the help of Python and Pandas. In the following Gist I show how we can directly use the ‘built in’ Java Query functionality of Smart-Edgar from Scala in order to calculated some financial KPIs.

By pschatzmann, 7 years21. December 2018 ago

Data Science

Processing 2.1 Mio Records from Solr in a Spark Cluster with BeakerX¶

I decided to build a repository of news headlines: I loaded all ‘New York Times’ headlines since the year 2000 and all Business related news from the ‘Guardian’ into the Solr Search engine. It has never been the intention to process all documents in one run but the goal was to search for the relevant articles with the help of the search engine and then process only the relevant headlines. Out of curiosity however, I Read more

By pschatzmann, 7 years18. December 2018 ago

Data Science

OpenNLP: Predicting Stock Movements from the News

In my last blog I demonstrated how to build a model that can predict if a stock is going up or down based on the news headlines using Spark MLLib. In this demo I will do the same – but with the help of OpenNLP. The solution consists of the following components OpenNLP (Text Classification) My News-Digest fuctionality (which I have described in my last blogs) Investor (Determination of the stock prices to calculate the labels ) confusion-matrix (to evaluate the Read more

By pschatzmann, 7 years17. December 2018 ago

Data Science

MLLib: Predicting Stock Movements from the News

In this blog we will demonstrate how we can predict if a stock is going up or down based on the news headlines. The solution consists of the following components: Spark MLLib (Machine Learning) My News-Digest (which I have described in my last blogs) Investor (we determine the stock prices to calculate the labels )

By pschatzmann, 7 years12. December 2018 ago

Data Science

The Decline of the New York Times – Producing Charts with Spark

After we have seen that processing large amounts of data with Spark is efficient, we will demonstrate how we can use a Spark DataFrame to generate Charts with Vegas-Spark! We display the number of all new York Times Articles and the Business Guardian Articles over time.

By pschatzmann, 7 years12. December 2018 ago