In the US, all companies, foreign and domestic, are required to file registration statements, periodic reports, and other forms electronically through EDGAR. Anyone can access and download this information for free.

This goal of this project is to make this information accessible in an easy way so that it can be used by any Data Science functionality. It consists of the following core functionality

  • Automatic Download of the latest XBRL files
  • REST Services to access XBRL files
  • REST Services to access consolidated numerical information
  • Java API for XBRL files

The solution is provided as Docker image

The REST service provides the content of 10-K and 10-Q Edgar xbrl filings. Below you can find the complete docker-compose.yml file for the solution. Just start the application with docker-compose up.

    When you start the application the first time, a complete initial load is started: Because of the big size of the data this is taking a very long time. After the initial load has completed we just download the latest changes every 60 minutes (see timer)

    version: '3.0'
    services:
     edgar-db:
     image: postgres:alpine
     container_name: db-edgar
     restart: always
     environment:
     - TZ=Europe/Zurich
     - POSTGRES_USER=tbd
     - POSTGRES_PASSWORD=tbd
     volumes:
     - /srv/postgres-edgar:/var/lib/postgresql/data
     ports:
     - 5432
    
    edgar-service:
     image: pschatzmann/smart-edgar
     container_name: edgar-db
     environment:
     - TZ=Europe/Zurich
     - jdbcDriver=org.postgresql.Driver
     - jdbcURL=jdbc:postgresql://edgar-db:5432/edgar
     - jdbcUser=tbd
     - jdbcPassword=tbd
     links:
     - smart-edgar-db
     volumes:
     - /srv/SmartEdgar:/usr/local/bin/SmartEdgar/data/
     ports:
     - "9997:9997"
    
    edgar-load:
     image: pschatzmann/smart-edgar
     container_name: edgar-file
     environment:
     - TZ=Europe/Zurich
     - timer=60
     - formsRegex=10-K.*|10-Q.*
     - jdbcDriver=org.postgresql.Driver
     - jdbcURL=jdbc:postgresql://edgar-db:5432/edgar
     - jdbcUser=tbd
     - jdbcPassword=tbd
     links:
     - smart-edgar-db
     command: 
     - ./start.sh
     - ch.pschatzmann.edgar.dataload.DownloadProcessorJDBC
     volumes:
     - /data/SmartEdgar:/usr/local/bin/SmartEdgar/data/

    You can access the REST functionality at http://localhost:9997 in your web browser. Replace localhost with the hostname if you want to access the solution from a different machine.. This takes you to the Swagger UI that you can use to play around with the web services:

    Initial Setup (XBRL Download)

    We provide the functionality so that you can automatically download the latest relevant XBRL files. Each filing is stored in a zip file independently of the fact if EDGAR provides zip files (new filings) or individual xml and xsl files (old filings). Here you find the necessary information if you do not want to rely on the default logic or if you want to force a reload.

    We support the following data load scenarios:

    1. download the XBRL file and load it into a Postgres Database (DownloadProcessorJDBC)
    2. download the XBRL files only (DownloadProcessorXbrlFile)

    Environment Variables

    We recommend to download all information once and subsequently only retrieve the changes. This can be achieved with the help of the following environment variables

    • history
      • True: determine all available filings inEDGAR
      • False:determine only the latest EDGAR filings
      • <empty value>: The system returns false only if a complete load (database) has completed.
    • timer
      • time interval in minutes in which the download is repeated
      • If the value is empty the functionality is executed only once

    Delta Logic

    • We load a filing from EDGAR only if it does not exist in our file system
    • We load a filing into the Database only if it has not been loaded yet

    Download of Files into Database

    Here is the example to force a complete initial data load of all files into the database.

    version: '3.0'
    services:
     edgar-db:
     image: postgres:alpine
     container_name: db-edgar
     restart: always
     environment:
     - TZ=Europe/Zurich
     - POSTGRES_USER=tbd
     - POSTGRES_PASSWORD=tbd
     volumes:
     - /srv/postgres-edgar:/var/lib/postgresql/data
     ports:
     - 5432:5432
    
    edgar-load:
     image: pschatzmann/smart-edgar
     container_name: edgar-file
     environment:
     - TZ=Europe/Zurich
     - history=true
     - formsRegex=10-K.*|10-Q.*
     - jdbcDriver=org.postgresql.Driver
     - jdbcURL=jdbc:postgresql://smart-edgar-db:5432/edgar
     - jdbcUser=tbd
     - jdbcPassword=tbd
     links:
     - smart-edgar-db
     command: 
     - ./start.sh
     - ch.pschatzmann.edgar.dataload.DownloadProcessorJDBC
     volumes:
     - /data/SmartEdgar:/usr/local/bin/SmartEdgar/data/
    
    

    Download of Files (only)

    .Here is the example for the first initial data load of all xbrl zip files without loading them into the database

    version: '3.0'
    services:
     edgar-files:
     image: pschatzmann/smart-edgar
     container_name: edgar-file
     environment:
     - history=true
     - formsRegex=10-Q.*|10-K.*
     command: 
     - ./start.sh
     - ch.pschatzmann.edgar.dataload.DownloadProcessorXbrlFile
     volumes:
     - /data/SmartEdgar:/usr/local/bin/SmartEdgar/data/

    Docker Environment Variables

    Here is the list of all supported environment variables

    Environment Variable Default Value Description
    destinationFolder /usr/local/bin/SmartEdgar/data/ Data directory which is used to store and access the xbrl zip files
    timer number of minutes to wait before repeating the next data load
    history true if the initial load has never completed

    false if the initial load has completed

    Load historic data from EDGAR. Set the required value to override the default logic
    formsRegex 10-Q.*|10-K.* Regex which selects the forms to be loaded
    jdbcDriver org.postgresql.Driver Postgres jdbc driver
    jdbcUser userid to access the database
    jdbcPassword password to access the database
    jdbcURL jdbc:postgresql://nuc.local:5432/edgar jdbc url to access the database
    typeString VARCHAR(1000) default sql datatype for strings
    typeNumber DECIMAL(20,2) default sql datatype for numbers
    typeDate DATE default sql datatype for dates
    minPeriod 2005-04 Starting period for data load
    xmx 3000m xmx java memory setting

    Further Information

    Further information can be found in my posts