Ark Agent

Distributed Agent Used To Collect Current and Historical End Of Day Stock Data

Ark Agent

An application that leverages Celery and MongoDB to provide end of day and historical market data for stocks.

Use Cases

Ark Agent can be used for any one of the following and much more:.


Ark Agent automatically obtains and stores historical market data using Yahoo as the main source .

  • Obtains current and historical data for the entire S&P 500 in under 15 minutes.


Ark Agent has requires minor configuration to get started


  • Ensure mongo is in your path or update scripts
  • Setup additional indices depending on how you query the data

1. Download and Configure MongoDB

Using MongoDB's nice documentation MongoDB

2. Install Python dependencies

pip install celery pymongo ystockquote finsymbols pyyaml

3. Start MongoDB and Worker

  1. Start MongoDB
  2. Run script to configure indices ./

4. Update settings in configs/mongo_settings.yaml

5. Start Worker(s)

Celery will automatically select the number of workers to be created. Usually 1 worker per core is created.

celery worker --app ark_agent -l info -E -B

6. Sit back , relax and collect end of day data

Document Structure

eod_data collection

MongoDB shell version: 2.6.0
connecting to: test
> use stocks
switched to db stocks
> db.eod_data.find({symbol:'GOOG'}).pretty()
    "_id" : ObjectId("539c776b3973b0db2be18265"),
    "price_data" : {
        "Volume" : 1900300,
        "High" : 532.93,
        "AdjClose" : 531.35,
        "Low" : 523.88,
        "Close" : 531.35,
        "Open" : 527.11
    "date" : ISODate("2014-05-01T00:00:00Z"),
    "symbol" : "GOOG"
    "_id" : ObjectId("539c776b3973b0db2be18266"),
    "price_data" : {
        "Volume" : 1348000,
        "High" : 563.6,
        "AdjClose" : 560.55,
        "Low" : 557.9,
        "Close" : 560.55,
        "Open" : 560.51
    "date" : ISODate("2014-06-10T00:00:00Z"),
    "symbol" : "GOOG"

load_status collection

Type "it" for more
> db.load_status.find({symbol:'GOOG'}).pretty()
    "_id" : ObjectId("539c776b3973b0db2be18264"),
    "symbol" : "GOOG",
    "load_status" : {
        "last_run_success_date" : "2014-06-14",
        "last_run_status" : "success",
        "initial" : false,
        "last_run_date" : "2014-06-14"


By default Ark Agent uses MongoDB as both the messaging system and the datastore. Ideally you would like to seperate these functionality and possibly use a broker like RabbitMQ to seperate the data collection and cleaning components from the datababase. Ark Agent can be easily modified to do the above. The current setup was chosen to allow for others to quickly get started.

Base Architecture


Feel free to open a pull request or send me an email