## Getting Stock Prices

https://pythonprogramming.net/getting-stock-prices-python-programming-for-finance/

Hello and welcome to a Python for Finance tutorial series. In this series, we're going to run through the basics of importing financial (stock) data into Python using the Pandas framework. From here, we'll manipulate the data and attempt to come up with some sort of system for investing in companies, apply some machine learning, even some deep learning, and then learn how to back-test a strategy. I assume you know the [fundamentals](https://pythonprogramming.net/python-fundamental-tutorials/) of Python. If you're not sure if that's you, click the fundamentals link, look at some of the topics in the series, and make a judgement call. If at any point you are stuck in this series or confused on a topic or concept, feel free to ask for help and I will do my best to help.

A common question that I am asked is whether or not I make a profit investing or trading with these techniques. I mostly play with finance data for fun and to practice my data analysis skills, but it actually does also influence my investment decisions to this day. I do not do active algorithmic trading with programming at the time of my writing this, but I have, and I have actually made a profit, but it's a lot more work than you might think to algorithmically trade. Finally, the knowledge about how to manipulate and analyze financial data, as well as how to backtest trading stategies, has **saved** me a ton of money.

None of the strategies presented here will make you an ultra wealthy person. If they would, I'd probably keep them to myself! The knowledge itself, however, can save you money, and even make you money.

Alright great, let's get started. To begin, I am using Python 3.5, but you should be able to get by with later versions. I will assume you already have Python installed. If you do not have 64 bit Python, but do have a 64bit operating system, get 64 bit Python, it'll help you a bit later. If you're on a 32 bit operating system, I am sorry for your situation, but you should be fine to follow most of this anyway.

Required Modules to start:
1. **Numpy**
2. **Matplotlib**
3. **Pandas**
4. **Pandas-datareader**
5. **BeautifulSoup4**
6. **scikit-learn / sklearn**

That'll do for now, we'll deal with other modules as they come up. To begin, let's cover how we might go about dealing with stock data using pandas, matplotlib and Python.

If you'd like to learn more on Matplotlib, check out the [Data Visualization with Matplotlib tutorial series](https://pythonprogramming.net/matplotlib-intro-tutorial/).

If you'd like to learn more on Pandas, check out the [Data Analysis with Pandas tutorial series](https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/).

To begin, we're going to make the following imports:

In [1]:
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pandas_datareader.data as web

Datetime will easily allow us to work with dates, matplotlib to graph things, pandas to manipulate data, and the pandas_datareader is the newest pandas io library at the time of my writing this.

Now for some starting setup:

In [2]:
style.use('ggplot')

start = dt.datetime(2015, 1, 1)
end = dt.datetime.now()

We're setting a style, so our graphs don't look horrendous. In finance, it's of the utmost importance that your graphs are pretty, even if you're losing money. Next, we're setting a start and end datetime object, this will be the range of dates that we're going to grab stock pricing information for.

Now, we can make a dataframe from this data:

Note: This has changed since the video was filmed. Both Yahoo and Google have stopped their APIs, so we'll use morningstar this time:

In [3]:
# df = web.DataReader("TSLA", 'morningstar', start, end)

***Note: Seems `morningstar` not working anymore, we will use `yfinance` instead at this time:***

In [4]:
import yfinance as yf
df = yf.download('TSLA', start=start, end=end)

[*********************100%***********************]  1 of 1 completed


If you're not currently familiar with what a DataFrame object is, you can check out the tutorial on [Pandas](https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/), or just be content to think of it like a spreadsheet, or a database table that's in your memory/RAM. It's just a table of rows and columns, you have an index, and column names. In our case, our index will likely be date. The index should be something that relates to all of the columns.

The line web.DataReader('TSLA', "yahoo", start, end) uses the pandas_datareader package, looks for the stock ticker TSLA(Tesla), gets the information from yahoo, for the starting date of whatever start is and ends at the end variable that we chose. Just incase you don't know, a stock is a share of ownership of a company, and the ticker is the "symbol" used to reference the company in the stock exchange that it's on. Most tickers are 1-4 letters.

So now we've got a Pandas.DataFrame object that contains stock pricing information for Tesla. Let's see what we have here:

In [5]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-01-02,14.858,14.883333,14.217333,14.620667,14.620667,71466000
2015-01-05,14.303333,14.433333,13.810667,14.006,14.006,80527500
2015-01-06,14.004,14.28,13.614,14.085333,14.085333,93928500
2015-01-07,14.223333,14.318667,13.985333,14.063333,14.063333,44526000
2015-01-08,14.187333,14.253333,14.000667,14.041333,14.041333,51637500


Now, let's simplify this dataframe slightly:

In [6]:
# df.reset_index(inplace=True)
# df.set_index("Date", inplace=True)
# df = df.drop("Symbol", axis=1)

Now, the full code is:

In [7]:
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import yfinance as yf

style.use('ggplot')

start = dt.datetime(2015, 1, 1)
end = dt.datetime.now()
df = yf.download('TSLA', start=start, end=end)

df.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-01-02,14.858,14.883333,14.217333,14.620667,14.620667,71466000
2015-01-05,14.303333,14.433333,13.810667,14.006,14.006,80527500
2015-01-06,14.004,14.28,13.614,14.085333,14.085333,93928500
2015-01-07,14.223333,14.318667,13.985333,14.063333,14.063333,44526000
2015-01-08,14.187333,14.253333,14.000667,14.041333,14.041333,51637500


Now, this is a python object that is rows and columns, like a spreadsheet.

The .head() is something you can do with Pandas DataFrames, and it will output the first n rows, where n is the optional parameter you pass. If you don't pass a parameter, 5 is the default value. We mosly will use .head() to just get a quick glimpse of our data to make sure we're on the right track. Looks great to me!

In case you do not know:

* Open - When the stock market opens in the morning for trading, what was the price of one share?
* High - over the course of the trading day, what was the highest value for that day?
* Low - over the course of the trading day, what was the lowest value for that day?
* Close - When the trading day was over, what was the final price?
* Volume - For that day, how many shares were traded?
* Adj Close - This one is slightly more complicated, but, over time, companies may decide to do something called a stock split. For example, Apple did one once their stock price exceeded \\$1000. Since in most cases, people cannot buy fractions of shares, a stock price of \\$1,000 is fairly limiting to investors. Companies can do a stock split where they say every share is now 2 shares, and the price is half. Anyone who had 1 share of Apple for \\$1,000, after a split where Apple doubled the shares, they would have 2 shares of Apple (AAPL), each worth \\$500. Adj Close is helpful, since it accounts for future stock splits, and gives the relative price to splits. For this reason, the adjusted prices are the prices you're most likely to be dealing with.