The full solution for Smart EDGAR is provided as Docker image

The REST service provides the content of 10-K and 10-Q Edgar xbrl filings. Below you can find the complete docker-compose.yml file for the solution. Just start the application with docker-compose up.

    When you start the application the first time, a complete initial load is started: Because of the big size of the data this is taking a very long time. After the initial load has completed we just download the latest changes every 60 minutes (see timer)

    version: '3.0'
    services:
      edgar-db:
        image: postgres:alpine
        container_name: db-edgar
        restart: always
        environment:
          - TZ=Europe/Zurich
          - POSTGRES_USER=edgar
          - POSTGRES_PASSWORD=tbd
        volumes:
          - /data/SmartEdgar/postgresql/data:/var/lib/postgresql/data
        ports:
          - 5432:5432
    
      edgar-service:
        image: pschatzmann/smart-edgar
        container_name: edgar-db
        environment:
          - xmx=2000m
          - TZ=Europe/Zurich
          - jdbcDriver=org.postgresql.Driver
          - jdbcURL=jdbc:postgresql://edgar-db:5432/edgar
          - jdbcUser=edgar
          - jdbcPassword=tbd
          - destinationFolder=/usr/local/bin/SmartEdgar/data
        links:
          - edgar-db
        volumes:
          - /data/SmartEdgar:/usr/local/bin/SmartEdgar/data/
        ports:
          - "9997:9997"
    
      edgar-load:
        image: pschatzmann/smart-edgar
        environment:
          - xmx=500m
          - TZ=Europe/Zurich
          - formsRegex=10-K.*|10-Q.*
          - timer=60
          - history=false
          - jdbcDriver=org.postgresql.Driver
          - jdbcURL=jdbc:postgresql://edgar-db:5432/edgar
          - jdbcUser=edgar
          - jdbcPassword=tbd
        links:
          - edgar-db
        command:
          - ./start.sh
          - ch.pschatzmann.edgar.dataload.DownloadProcessorJDBC
        volumes:
          - /data/SmartEdgar:/usr/local/bin/SmartEdgar/data/
    

    You can access the REST functionality at http://localhost:9997 in your web browser. Replace localhost with the hostname if you want to access the solution from a different machine.. This takes you to the Swagger UI that you can use to play around with the web services:

    Initial Setup (XBRL Download)

    We provide the functionality so that you can automatically download the latest relevant XBRL files. Each filing is stored in a zip file independently of the fact if EDGAR provides zip files (new filings) or individual xml and xsl files (old filings). Here you find the necessary information if you do not want to rely on the default logic or if you want to force a reload.

    We support the following data load scenarios:

    1. download the XBRL file and load it into a Postgres Database (DownloadProcessorJDBC)
    2. download the XBRL files only (DownloadProcessorXbrlFile)

    Environment Variables

    We recommend to download all information once and subsequently only retrieve the changes. This can be achieved with the help of the following environment variables

    • history
      • True: determine all available filings inEDGAR
      • False:determine only the latest EDGAR filings
      • <empty value>: The system returns false only if a complete load (database) has completed.
    • timer
      • time interval in minutes in which the download is repeated
      • If the value is empty the functionality is executed only once

    Delta Logic

    • We load a filing from EDGAR only if it does not exist in our file system
    • We load a filing into the Database only if it has not been loaded yet

    Download of Files into Database

    Here is the example to force a complete initial data load of all files into the database.

    version: '3.0'
    services:
      edgar-db:
        image: postgres:alpine
        container_name: db-edgar
        restart: always
        environment:
          - TZ=Europe/Zurich
          - POSTGRES_USER=edgar
          - POSTGRES_PASSWORD=tbd
        volumes:
          - /data/SmartEdgar/postgresql/data:/var/lib/postgresql/data
        ports:
          - 5432:5432
    
      edgar-load:
        image: pschatzmann/smart-edgar
        environment:
          - xmx=500m
          - TZ=Europe/Zurich
          - formsRegex=10-K.*|10-Q.*
          - timer=60
          - history=true
          - jdbcDriver=org.postgresql.Driver
          - jdbcURL=jdbc:postgresql://edgar-db:5432/edgar
          - jdbcUser=edgar
          - jdbcPassword=tbd
        links:
          - edgar-db
        command:
          - ./start.sh
          - ch.pschatzmann.edgar.dataload.DownloadProcessorJDBC
        volumes:
          - /data/SmartEdgar:/usr/local/bin/SmartEdgar/data/
    

    Download of Files (only)

    .Here is the example for the first initial data load of all xbrl zip files without loading them into the database

    version: '3.0'
    services:
      edgar-files:
        image: pschatzmann/smart-edgar
        environment:
          - xmx=500m
          - TZ=Europe/Zurich
          - formsRegex=10-K.*|10-Q.*
          - history=true
        links:
          - edgar-db
        command:
          - ./start.sh
          - ch.pschatzmann.edgar.dataload.DownloadProcessorXbrlFile
        volumes:
          - /data/SmartEdgar:/usr/local/bin/SmartEdgar/data/
    

    Docker Environment Variables

    Here is the list of all supported environment variables

    Environment Variable Default Value Description
    destinationFolder /usr/local/bin/SmartEdgar/data/ Data directory which is used to store and access the xbrl zip files
    timer number of minutes to wait before repeating the next data load
    history true if the initial load has never completed

    false if the initial load has completed

    Load historic data from EDGAR. Set the required value to override the default logic
    formsRegex 10-Q.*|10-K.* Regex which selects the forms to be loaded
    jdbcDriver org.postgresql.Driver Postgres jdbc driver
    jdbcUser userid to access the database
    jdbcPassword password to access the database
    jdbcURL jdbc:postgresql://nuc.local:5432/edgar jdbc url to access the database
    typeString VARCHAR(1000) default sql datatype for strings
    typeNumber DECIMAL(20,2) default sql datatype for numbers
    typeDate DATE default sql datatype for dates
    minPeriod 2005-04 Starting period for data load
    xmx 3000m xmx java memory setting

    Further Information

    Further information can be found in my posts

    Categories: EDGAR

    2 Comments

    Daniel · 16. January 2019 at 5:56

    Phil, Thanks for sharing your project. When I run Smart Docker Image using the YAML file at the top of this post, I get the following error. Any ideas what’s going wrong?

    ERROR: Service ‘edgar-service’ has a link to service ‘smart-edgar-db’ which is undefined.

    Thanks!

      pschatzmann · 16. January 2019 at 7:56

      Hallo,
      I am not sure what the issue exactly is: There were issues with the indentation and the user needs to be set to edgar.
      I have updated the document and confirmed that a copy-pasted version of docker-compose.yml is working now.

      I recommend to change the password und set the volumes mapping to a directory that makes sense for you.
      Please let me know if you still have issues.

      Kind regards
      Phil

    Leave a Reply

    Avatar placeholder

    Your email address will not be published. Required fields are marked *