*TLDR? If you just want the script, I’ll be building a repository on GitHub here.*

In the few weeks since my last post, there has been an incredible amount of interest in Prep + Python – so much so that it took me a little by surprise… and yet it really shouldn’t. Python is a very powerful language, combining that with Prep brings automation and new possibilities to both.

As a quick update to my earlier post – I thought I’d be interesting to share the alternative take on the input step. Earlier, I essentially disregarded the input step as early as I could, it became fairly redundant very early on.Β 

This time, I decided to put it to use. Creating a simple one column CSV file, this becomes the list of stocks I’d like my flow to pull in from the web.

The script then looks for “Yahoo Tickers” and runs through each of the ticker values in the column. Worth noting, as I’m not processing these in parallel, the more you have in this list, the longer the script will take to process. For comparison, my list of 903 took ~15 minutes to complete… but does give me ~700k rows of glorious stock data! You also run the risk of the API blocking your calls. Less fun.

If you’re unsure on the format Yahoo expects these to be in, head over to Yahoo Finance and search for the stocks you’re after. You’ll notice the ticker value you need in the URL:

Once you have your list, use that as your input node, add the script, and away you go!

The script in question…

#################################
# Tableau Prep | TabPy          #
# Scenario 2: Bring in web data #
#################################

import time
import pandas_datareader as dr
from datetime import datetime, timedelta

def getstock(df):

    dataframe = pd.DataFrame()

    for stock in df['Yahoo Ticker']:
        try:
            yesterday = (datetime.now() - timedelta(1)).strftime('%Y-%m-%d')
            stockdf = dr.DataReader(stock, 'yahoo', '2016-12-01', yesterday)

            del stockdf['Close']
            # rename Adjusted Close to Close
            stockdf.rename(columns={'Adj Close': 'Close'}, inplace=True)

            start_date = stockdf.index.min()
            end_date = stockdf.index.max()
            all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')

            # Filling missing weekdays
            stockdf = stockdf.reindex(all_weekdays)
            stockdf = stockdf.fillna(method='ffill')
            stockdf['Stock'] = stock

            stockdf['% Change'] = stockdf['Close'].pct_change(1)
            stockdf['Change'] = stockdf['Close'].diff()
            stockdf['High-Low Diff'] = stockdf['High'] - stockdf['Low']
            stockdf['14ATR'] = stockdf['High-Low Diff'].abs().rolling(window=14).mean()
            stockdf['20ATR'] = stockdf['High-Low Diff'].abs().rolling(window=20).mean()

            stockdf = stockdf.reset_index()
            stockdf.rename(columns={'index': 'Date'}, inplace=True)
            stockdf['Date'] = stockdf['Date'].astype(str)

            dataframe = dataframe.append(stockdf)

        except Exception as e:
            print (e)
            pass

    df = dataframe
    return df

def get_output_schema():
    return pd.DataFrame({
        'Stock' : prep_string(),
        'Date' : prep_string(),
        'High' : prep_decimal(),
        'Low' : prep_decimal(),
        'Open': prep_decimal(),
        'Volume': prep_decimal(),
        'Close': prep_decimal(),
        '% Change': prep_decimal(),
        'Change' : prep_decimal(),
        'High-Low Diff': prep_decimal(),
        '14ATR': prep_decimal(),
        '20ATR': prep_decimal(),
        })
Categories: Tableau Prep

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *