Downloading Stock Data Using Python Code:¶
In this document, we will apply what we learned in the last lecture by obtaining a list of US stock data using the package yfinance
. Then we obtain some preliminary statistics on this data.
Loading the necessary packages¶
Always make it a habit to load the packages you need in the first cell:
# This package is used for downloading stock data from Yahoo Finance
# we will import it in our project and give it a nickname (yf)
import yfinance as yf
# This package is important so we can use tables and datasets
# we will nickname it (pd)
import pandas as pd
# load the package (datetime) and refer it as the nickname (dt)
import datetime as dt
# The following package allows us to draw figures and charts
import matplotlib as mpl
import matplotlib.pyplot as plt
# Numpy is a package that helps python use math to deal with numbers
import numpy as np
# define two variables, which are dates, so we can pass them in the `yf.downlaod` command later
start_date = dt.date(1999,12,31)
end_date = dt.date(2023,12,31)
# Note that if you want to add days, weeks, or months, you can use the command dt.timedelta().
# for example:
add_days = dt.timedelta(days = 100)
new_start_date = start_date + add_days
new_end_date = end_date + add_days
print(new_start_date, new_end_date)
2000-04-09 2024-04-09
Obtaining stock data for multiple companies in one comaand¶
By passing a list of tickers, the function can obtain data of multiple companies instead of running the function .download
each time.
# The following are a list of tickers
ticker_list = ["BA", "AMZN", "CVX", "DIS", "GOLD", "HSY", "INTC", "K", "KO", "M", "MMM", "MSFT", "NVDA"]
# Instead of using a ticker, now we will pass the list as our input
main_data = yf.download(ticker_list, start = start_date, end = end_date, interval = "1mo")
[*********************100%***********************] 13 of 13 completed
# LEts have a look at the dataset
main_data.head()
Adj Close | ... | Volume | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | ... | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | |
Date | |||||||||||||||||||||
2000-01-01 | 3.228125 | 28.723930 | 17.139805 | 28.004887 | 11.627259 | 12.283645 | 28.394407 | 10.309633 | 14.823807 | 11.775007 | ... | 247072276 | 30307800 | 27437600 | 1406932200 | 20080045 | 224495200 | 54676200 | 68507000 | 1274875200 | 494145600 |
2000-02-01 | 3.443750 | 23.842466 | 15.307978 | 26.221441 | 11.582878 | 12.699118 | 32.430264 | 10.761341 | 12.549431 | 10.378275 | ... | 144199796 | 41421300 | 26855600 | 965312200 | 23470157 | 189933800 | 54626000 | 66870400 | 1334487600 | 916084800 |
2000-03-01 | 3.350000 | 24.490906 | 19.106281 | 31.812773 | 11.139091 | 14.168968 | 37.876537 | 11.076875 | 12.113902 | 11.951815 | ... | 185073291 | 34786700 | 30930800 | 1131350200 | 22992184 | 303736800 | 50651600 | 87122200 | 2028187600 | 1755216000 |
2000-04-01 | 2.759375 | 25.705317 | 17.594824 | 33.644440 | 11.937909 | 13.224373 | 36.405254 | 10.431618 | 12.240291 | 9.618027 | ... | 111137455 | 35090400 | 19653800 | 1209578800 | 18671158 | 240129800 | 44807000 | 68960000 | 2258146600 | 648172800 |
2000-05-01 | 2.415625 | 25.300518 | 19.106281 | 32.535797 | 12.869864 | 15.077240 | 35.795193 | 13.066406 | 13.826995 | 10.891005 | ... | 110547998 | 39834900 | 17981400 | 1145907800 | 23087501 | 229033600 | 53502600 | 51990800 | 1344430800 | 869558400 |
5 rows × 78 columns
# Mostly we are interested in the big column "Adj Close"
# So I will crete a new dataset, which is a slice of "main_data"
stock_prices = main_data['Adj Close']
stock_prices.head(n = 7)
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||
2000-01-01 | 3.228125 | 28.723930 | 17.139805 | 28.004887 | 11.627259 | 12.283645 | 28.394407 | 10.309633 | 14.823807 | 11.775007 | 23.799902 | 30.283478 | 0.708314 |
2000-02-01 | 3.443750 | 23.842466 | 15.307978 | 26.221441 | 11.582878 | 12.699118 | 32.430264 | 10.761341 | 12.549431 | 10.378275 | 22.417660 | 27.653498 | 1.223127 |
2000-03-01 | 3.350000 | 24.490906 | 19.106281 | 31.812773 | 11.139091 | 14.168968 | 37.876537 | 11.076875 | 12.113902 | 11.951815 | 22.658825 | 32.874763 | 1.614611 |
2000-04-01 | 2.759375 | 25.705317 | 17.594824 | 33.644440 | 11.937909 | 13.224373 | 36.405254 | 10.431618 | 12.240291 | 9.618027 | 22.163111 | 21.581335 | 1.703300 |
2000-05-01 | 2.415625 | 25.300518 | 19.106281 | 32.535797 | 12.869864 | 15.077240 | 35.795193 | 13.066406 | 13.826995 | 10.891005 | 21.939230 | 19.357439 | 2.181084 |
2000-06-01 | 1.815625 | 27.180988 | 17.653696 | 29.932936 | 12.981661 | 14.168630 | 38.388420 | 12.911996 | 14.879401 | 9.547308 | 21.373287 | 24.752775 | 2.429532 |
2000-07-01 | 1.506250 | 31.731459 | 16.440578 | 29.740133 | 11.385468 | 13.511317 | 38.334579 | 11.257315 | 15.934709 | 6.806873 | 23.191954 | 21.600664 | 2.293363 |
Descriptive Statistics for Stock Returns¶
In order to find the descriptive stats for the list of stocks we picked, we need to calculate each period's return. Luckily, pandas
has a useful function pct_change
. We can apply it to our whole dataset stock_prices
. Becuase we are apllying it to the whole table, the function automatically measures the % change for each column:
# Take % changes of each row for the table "stock_prices"
# Save the results in a new variable "stock_returns"
stock_returns = stock_prices.pct_change()
# Lets see the new table
stock_returns.head()
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||
2000-01-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2000-02-01 | 0.066796 | -0.169944 | -0.106876 | -0.063683 | -0.003817 | 0.033823 | 0.142136 | 0.043814 | -0.153427 | -0.118618 | -0.058078 | -0.086845 | 0.726813 |
2000-03-01 | -0.027223 | 0.027197 | 0.248126 | 0.213235 | -0.038314 | 0.115744 | 0.167938 | 0.029321 | -0.034705 | 0.151619 | 0.010758 | 0.188810 | 0.320068 |
2000-04-01 | -0.176306 | 0.049586 | -0.079108 | 0.057576 | 0.071713 | -0.066666 | -0.038844 | -0.058253 | 0.010433 | -0.195266 | -0.021877 | -0.343529 | 0.054929 |
2000-05-01 | -0.124575 | -0.015748 | 0.085904 | -0.032952 | 0.078067 | 0.140110 | -0.016758 | 0.252577 | 0.129630 | 0.132353 | -0.010102 | -0.103047 | 0.280505 |
# Now we can find the descriptive stats for the returns by using pandas handy tool "describe"
return_stats = stock_returns.describe()
return_stats
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 |
mean | 0.021791 | 0.012425 | 0.009857 | 0.006993 | 0.008019 | 0.011105 | 0.006905 | 0.007126 | 0.006123 | 0.010586 | 0.007103 | 0.012066 | 0.037451 |
std | 0.129926 | 0.095598 | 0.069003 | 0.076331 | 0.114357 | 0.057571 | 0.097305 | 0.049604 | 0.050752 | 0.129830 | 0.060703 | 0.081098 | 0.174042 |
min | -0.411552 | -0.454664 | -0.214625 | -0.267794 | -0.381056 | -0.177236 | -0.444733 | -0.138635 | -0.172743 | -0.628874 | -0.157037 | -0.343529 | -0.486551 |
25% | -0.048718 | -0.042805 | -0.029320 | -0.034121 | -0.065293 | -0.025150 | -0.047196 | -0.020944 | -0.024000 | -0.061762 | -0.028618 | -0.037006 | -0.054362 |
50% | 0.021677 | 0.015899 | 0.008623 | 0.005748 | 0.007592 | 0.008660 | 0.007672 | 0.009468 | 0.008546 | 0.011221 | 0.010560 | 0.017244 | 0.030835 |
75% | 0.087617 | 0.070994 | 0.046279 | 0.050940 | 0.072783 | 0.044363 | 0.062263 | 0.035984 | 0.038312 | 0.077279 | 0.039402 | 0.055028 | 0.122156 |
max | 0.621776 | 0.459312 | 0.269666 | 0.248734 | 0.426362 | 0.275953 | 0.337428 | 0.252577 | 0.141928 | 0.644122 | 0.213949 | 0.407781 | 0.826239 |
The new table return_stats
is like any other pandas table. I can add, subtract, or delete any row or column. So, I would like to measure the Sharpe Ratio for each stock. Recall, the Sharpe Ratio is:
$$Sharpe_{x} = \frac{Expected \,\, Return_{x} - Risk-free}{Stock \,\, Risk} = \frac{E(R_{x})- R_{f}}{\sigma_{x}}$$
Let's apply this by first assuming the risk-free rate is 0.5%
# first, let me flip this table (make the row a column, and a column a row)
return_stats_flipped = return_stats.transpose()
return_stats_flipped
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
AMZN | 287.0 | 0.021791 | 0.129926 | -0.411552 | -0.048718 | 0.021677 | 0.087617 | 0.621776 |
BA | 287.0 | 0.012425 | 0.095598 | -0.454664 | -0.042805 | 0.015899 | 0.070994 | 0.459312 |
CVX | 287.0 | 0.009857 | 0.069003 | -0.214625 | -0.029320 | 0.008623 | 0.046279 | 0.269666 |
DIS | 287.0 | 0.006993 | 0.076331 | -0.267794 | -0.034121 | 0.005748 | 0.050940 | 0.248734 |
GOLD | 287.0 | 0.008019 | 0.114357 | -0.381056 | -0.065293 | 0.007592 | 0.072783 | 0.426362 |
HSY | 287.0 | 0.011105 | 0.057571 | -0.177236 | -0.025150 | 0.008660 | 0.044363 | 0.275953 |
INTC | 287.0 | 0.006905 | 0.097305 | -0.444733 | -0.047196 | 0.007672 | 0.062263 | 0.337428 |
K | 287.0 | 0.007126 | 0.049604 | -0.138635 | -0.020944 | 0.009468 | 0.035984 | 0.252577 |
KO | 287.0 | 0.006123 | 0.050752 | -0.172743 | -0.024000 | 0.008546 | 0.038312 | 0.141928 |
M | 287.0 | 0.010586 | 0.129830 | -0.628874 | -0.061762 | 0.011221 | 0.077279 | 0.644122 |
MMM | 287.0 | 0.007103 | 0.060703 | -0.157037 | -0.028618 | 0.010560 | 0.039402 | 0.213949 |
MSFT | 287.0 | 0.012066 | 0.081098 | -0.343529 | -0.037006 | 0.017244 | 0.055028 | 0.407781 |
NVDA | 287.0 | 0.037451 | 0.174042 | -0.486551 | -0.054362 | 0.030835 | 0.122156 | 0.826239 |
# I will assume the risk free rate is 0.005
r_free = 0.005
return_stats_flipped['Sharpe'] = (return_stats_flipped['mean'] - r_free)/return_stats_flipped['std']
return_stats_flipped
count | mean | std | min | 25% | 50% | 75% | max | Sharpe | |
---|---|---|---|---|---|---|---|---|---|
AMZN | 287.0 | 0.021791 | 0.129926 | -0.411552 | -0.048718 | 0.021677 | 0.087617 | 0.621776 | 0.129233 |
BA | 287.0 | 0.012425 | 0.095598 | -0.454664 | -0.042805 | 0.015899 | 0.070994 | 0.459312 | 0.077671 |
CVX | 287.0 | 0.009857 | 0.069003 | -0.214625 | -0.029320 | 0.008623 | 0.046279 | 0.269666 | 0.070391 |
DIS | 287.0 | 0.006993 | 0.076331 | -0.267794 | -0.034121 | 0.005748 | 0.050940 | 0.248734 | 0.026107 |
GOLD | 287.0 | 0.008019 | 0.114357 | -0.381056 | -0.065293 | 0.007592 | 0.072783 | 0.426362 | 0.026397 |
HSY | 287.0 | 0.011105 | 0.057571 | -0.177236 | -0.025150 | 0.008660 | 0.044363 | 0.275953 | 0.106048 |
INTC | 287.0 | 0.006905 | 0.097305 | -0.444733 | -0.047196 | 0.007672 | 0.062263 | 0.337428 | 0.019574 |
K | 287.0 | 0.007126 | 0.049604 | -0.138635 | -0.020944 | 0.009468 | 0.035984 | 0.252577 | 0.042864 |
KO | 287.0 | 0.006123 | 0.050752 | -0.172743 | -0.024000 | 0.008546 | 0.038312 | 0.141928 | 0.022124 |
M | 287.0 | 0.010586 | 0.129830 | -0.628874 | -0.061762 | 0.011221 | 0.077279 | 0.644122 | 0.043027 |
MMM | 287.0 | 0.007103 | 0.060703 | -0.157037 | -0.028618 | 0.010560 | 0.039402 | 0.213949 | 0.034641 |
MSFT | 287.0 | 0.012066 | 0.081098 | -0.343529 | -0.037006 | 0.017244 | 0.055028 | 0.407781 | 0.087124 |
NVDA | 287.0 | 0.037451 | 0.174042 | -0.486551 | -0.054362 | 0.030835 | 0.122156 | 0.826239 | 0.186455 |
Exporting Datasets¶
The pandas
package offers its users a way to export their tables and datasets to any type of file. It also offers options to import files into a new dataset. In the following code I will export the 2 tables I have created earlier (return_stats_flipped
, stock_returns
) to an Excel file. The produced Excel file will be located in the same folder where this Jupyter Notebook exists.
stock_returns.to_excel("all_stock_returns.xlsx")
return_stats_flipped.to_excel("return_stats.xlsx")