Recently I have spent some time to investigate the options to access the history of news articles via an API. I was mainly interested in APIs which can be accessed free of charge.

Here is the list of the most useful providers:

  • Guardian
    – Easy API
    – Acceptable Rate Limits
    – Access to over 1,900,000 pieces of content
    – Free for non-commercial usage
  • New York Times
    – Provides API to search and separate API to download monthly data
    – Rate Limits are quickly reached in the search API
    – Provides data since 1851
    – Free for non-commercial usage
  • RSS
    – Many Free Sources
    – Very limited History

As a conclusion, I was ending up with an architecture which

  • replicates the data sources into a Local Search engine (Solr)
  • provides some Utility classes to simplify different scenarios

In this Gist I provide a quick overview of the possibilities to access news headlines using functionality which is available in the JVM. The examples are implemented in Scala using Jupyter with the BeakerX kernel.

We also have a Docker Image.

 


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *