Portfolio Construction & Measuring Risk ¶
This notebook will shed some light on the importance of diversification in investments.
First, using the pandas
package, I will review the basic steps for collecting & measuring stock returns, as well as how to measure a portfolio return using specific weights. After that, I will show an example of how one can assess a portfolio's risk level, by introducing the concept of return correlations among securities. At the end of this notebook, you will learn how to estimate the risk of a portfolio of stocks.
Loading the necessary packages¶
When using Python, I recommend you make it a habit to load the needed packages for your session at the beginning of your code.
This step is similar to the steps taken in my previous Jupyter Notebook series. However, I will add another important package here: (fredapi
). This package is made and managed by the Federal Reserve Bank of St. Louis in the United States. Using the package allows the user to download economic data directly from the Federal reserve website. It is basically a portal one can use to obtain any public data provided by the Fed. To be able to use it, the user is required to open an account at their website and request an *API Key*. From my experience, the API-Key request is approved immediately. Check this link for more info: https://fred.stlouisfed.org/docs/api/api_key.html
# If it is your first time using fredapi, you need to download it to anaconda.
# You can do it through the anaconda program (like what you did with the package yfinance)
# Or alternatively running the following code here only once on your computer
import sys
!{sys.executable} -m pip install fredapi
Collecting fredapi Obtaining dependency information for fredapi from https://files.pythonhosted.org/packages/96/d4/f81fa9f67775a6a4b9e2cd8487239d61a9698cb2b9c02a5a2897d310f7a4/fredapi-0.5.1-py3-none-any.whl.metadata Downloading fredapi-0.5.1-py3-none-any.whl.metadata (5.0 kB) Requirement already satisfied: pandas in /Users/eyad/anaconda3/lib/python3.11/site-packages (from fredapi) (1.5.3) Requirement already satisfied: python-dateutil>=2.8.1 in /Users/eyad/anaconda3/lib/python3.11/site-packages (from pandas->fredapi) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /Users/eyad/anaconda3/lib/python3.11/site-packages (from pandas->fredapi) (2022.7) Requirement already satisfied: numpy>=1.21.0 in /Users/eyad/anaconda3/lib/python3.11/site-packages (from pandas->fredapi) (1.24.3) Requirement already satisfied: six>=1.5 in /Users/eyad/anaconda3/lib/python3.11/site-packages (from python-dateutil>=2.8.1->pandas->fredapi) (1.16.0) Downloading fredapi-0.5.1-py3-none-any.whl (11 kB) Installing collected packages: fredapi Successfully installed fredapi-0.5.1
# Use the package fredapi if you have installed it
from fredapi import Fred
# Don't forget to use your api-key after opening account with FRED
# So then you can download any data from their server
fred = Fred(api_key = '44303cd2e2752fc88d6080b1e0d9d1e9')
# This is the package for downloading US stock data
import yfinance as yf, pandas as pd, datetime as dt, matplotlib as mpl, matplotlib.pyplot as plt
# This package is helpful to use for matrix multiplication.
# It is also helpful for creating random numbers
import numpy as np
Collecting Monthly Returns¶
Let us use the package yfinance
to download monthly prices for the same list of stocks:
["BA", "AMZN", "CVX", "DIS", "GOLD", "HSY", "INTC", "K", "KO", "M", "MMM", "MSFT", "NVDA"].
Further, I would like to obtain historical monthly T-bill data to measure the risk-free rate ($R_{f}$), as well as monthly returns for a reliable index that represents the performance of the market for a given month.
T-bill monthly returns will be obtained from FRED. The code (or "ticker" if you wish) for the 1-month-maturity treasury bill in FRED's datasets is TB4WK. If I am working with annual or daily stock returns for example, I will need to download something different from FRED. This is because the holding period risk-free return ($R_{f}$) has to match the return of the investment horizon. (For my case it is monthly)
Note: If you need daily yields on 1-month T-bills use: DTB4WK, for annual $R_{f}$ rates use: RIFSGFSY01NA
Keep in mind the monthly T-bill rates reported by FRED are *Annualized*; they are standardized to an annual return. Thus, I need to return it back to a monthly rate: $$R_{f, t} = (1+{Reported \,\,\, Rate_{t}})^{\frac{1}{12}}-1$$
Note: The 1-month-maturity T-bill data offered by FRED starts in 2001. If we want to collect monthly rates for older dates, there is an alternative way. We can collect T-bill prices for the three-month maturity bills* (starts at the end of 1981) and measure the monthly return by calculating the average:* $\bar{R_{f,t}} = \frac{\left( R_{f, t-2} + R_{f, t-1} + R_{f, t}\right)}{3}$. Then applying the main formula above to return it to monthly rates.
The appropriate stock market index I should use is an index that represents the performance of the overall U.S. stock market. One appropriate option is the Wilshire 5000 Total Market Index, it's ticker in yfinance is ^W5000.
# The following are the list of tickers we would like to download from yfinance
ticker_list = ["^W5000", "BA", "AMZN", "CVX", "DIS", "GOLD", "HSY", "INTC", "K", "KO", "M", "MMM", "MSFT", "NVDA"]
# Set the start and end date
start_date = dt.date(1999,12,31)
end_date = dt.date(2023,12,31)
# Instead of using a ticker, now we will pass the name of the list
stock_data = yf.download(ticker_list, start=start_date, end=end_date, interval='1mo')
# We only need the adj. close prices, so I "slice" the table and replace the original table name
stock_data = stock_data['Adj Close']
# the following code obtains the information on risk-free rates from FRED
# we will save it in a variable called (r_f)
r_f = fred.get_series('TB4WK')
[*********************100%%**********************] 14 of 14 completed
# Recall I want monthly returns, not prices
# So I will take % change from last period of the whole dataset (stock_data).
# That way it measures the return for all the columns in one line of code!
stock_return= stock_data.pct_change()
# Lets see it
stock_return.head()
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | ^W5000 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||||
2000-01-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2000-02-01 | 0.066796 | -0.169944 | -0.106876 | -0.063683 | -0.003817 | 0.033823 | 0.142136 | 0.043815 | -0.153428 | -0.118619 | -0.058078 | -0.086845 | 0.726813 | 0.021192 |
2000-03-01 | -0.027223 | 0.027196 | 0.248125 | 0.213236 | -0.038314 | 0.115744 | 0.167938 | 0.028577 | -0.034704 | 0.151618 | 0.010757 | 0.188811 | 0.320068 | 0.058114 |
2000-04-01 | -0.176306 | 0.049587 | -0.079108 | 0.057576 | 0.071714 | -0.066667 | -0.038844 | -0.058252 | 0.010433 | -0.195266 | -0.021877 | -0.343529 | 0.054929 | -0.052775 |
2000-05-01 | -0.124575 | -0.015748 | 0.085904 | -0.032951 | 0.078067 | 0.140109 | -0.016757 | 0.252577 | 0.129630 | 0.132353 | -0.010101 | -0.103047 | 0.280505 | -0.036091 |
# I do not need the first row (it is empty)
# I apply a function (dropna) on my table. Also I use the feature (inplace)
# Meaning to put this new table inplace the original one after dropping the missing values
stock_return.dropna(inplace=True)
# Lets see it
stock_return.head()
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | ^W5000 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||||
2000-02-01 | 0.066796 | -0.169944 | -0.106876 | -0.063683 | -0.003817 | 0.033823 | 0.142136 | 0.043815 | -0.153428 | -0.118619 | -0.058078 | -0.086845 | 0.726813 | 0.021192 |
2000-03-01 | -0.027223 | 0.027196 | 0.248125 | 0.213236 | -0.038314 | 0.115744 | 0.167938 | 0.028577 | -0.034704 | 0.151618 | 0.010757 | 0.188811 | 0.320068 | 0.058114 |
2000-04-01 | -0.176306 | 0.049587 | -0.079108 | 0.057576 | 0.071714 | -0.066667 | -0.038844 | -0.058252 | 0.010433 | -0.195266 | -0.021877 | -0.343529 | 0.054929 | -0.052775 |
2000-05-01 | -0.124575 | -0.015748 | 0.085904 | -0.032951 | 0.078067 | 0.140109 | -0.016757 | 0.252577 | 0.129630 | 0.132353 | -0.010101 | -0.103047 | 0.280505 | -0.036091 |
2000-06-01 | -0.248383 | 0.074325 | -0.076026 | -0.080000 | 0.008687 | -0.060264 | 0.072445 | -0.012356 | 0.076112 | -0.123377 | -0.025796 | 0.278721 | 0.113911 | 0.043327 |
Notice in the table above, the return for a given month is reported at the beginning of the month. For example, stock returns for February 2000 is reported on (February 1, 2001). To fix this, I can ask pandas
to push the date index to next month, then minus one day. It will apply it to all rows...
# Notice that the Date row in our table is the index
# So what we really want to do is modify our index by pushing it to the next month
# then reducing one day
# This way we will have the last day of the current month :)
stock_return.index = stock_return.index + pd.DateOffset(months=1) + pd.DateOffset(days=-1)
stock_return.head()
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | ^W5000 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||||
2000-02-29 | 0.066796 | -0.169944 | -0.106876 | -0.063683 | -0.003817 | 0.033823 | 0.142136 | 0.043815 | -0.153428 | -0.118619 | -0.058078 | -0.086845 | 0.726813 | 0.021192 |
2000-03-31 | -0.027223 | 0.027196 | 0.248125 | 0.213236 | -0.038314 | 0.115744 | 0.167938 | 0.028577 | -0.034704 | 0.151618 | 0.010757 | 0.188811 | 0.320068 | 0.058114 |
2000-04-30 | -0.176306 | 0.049587 | -0.079108 | 0.057576 | 0.071714 | -0.066667 | -0.038844 | -0.058252 | 0.010433 | -0.195266 | -0.021877 | -0.343529 | 0.054929 | -0.052775 |
2000-05-31 | -0.124575 | -0.015748 | 0.085904 | -0.032951 | 0.078067 | 0.140109 | -0.016757 | 0.252577 | 0.129630 | 0.132353 | -0.010101 | -0.103047 | 0.280505 | -0.036091 |
2000-06-30 | -0.248383 | 0.074325 | -0.076026 | -0.080000 | 0.008687 | -0.060264 | 0.072445 | -0.012356 | 0.076112 | -0.123377 | -0.025796 | 0.278721 | 0.113911 | 0.043327 |
# The risk-free rates obtained from FRED are in a (column) form, or we call it "series"
# We need to make it a pandas table (dataframe)
r_f = r_f.to_frame()
r_f.head()
0 | |
---|---|
2001-07-01 | 3.61 |
2001-08-01 | 3.48 |
2001-09-01 | 2.63 |
2001-10-01 | 2.24 |
2001-11-01 | 1.96 |
# Rename the index and call it "Date"
r_f.index.rename('Date', inplace = True)
# Note the rates are in whole numbers, not fractions. Let's adjust that:
r_f[0] = r_f[0] / 100
# Lets see the table now
r_f.head()
0 | |
---|---|
Date | |
2001-07-01 | 0.0361 |
2001-08-01 | 0.0348 |
2001-09-01 | 0.0263 |
2001-10-01 | 0.0224 |
2001-11-01 | 0.0196 |
# Remember the rates are annualized, so I will create a new column in r_f to measure
# the monthly Rf rate
# I will use the formula at the top of this document
r_f['Risk Free'] = (r_f[0] + 1)**(1/12) - 1
# Now I do not need the column named (0). I will drop it from the table
r_f.drop(columns=[0], inplace= True)
# Lets see the table now
r_f.head()
Risk Free | |
---|---|
Date | |
2001-07-01 | 0.002960 |
2001-08-01 | 0.002855 |
2001-09-01 | 0.002166 |
2001-10-01 | 0.001848 |
2001-11-01 | 0.001619 |
# Again, the dates here are reported at the beginning of the month
# I will apply the same method I did for the table (stock_return)
# by pushing the date index to the end of the month
r_f.index = r_f.index + pd.DateOffset(months=1) + pd.DateOffset(days=-1)
r_f.head()
Risk Free | |
---|---|
Date | |
2001-07-31 | 0.002960 |
2001-08-31 | 0.002855 |
2001-09-30 | 0.002166 |
2001-10-31 | 0.001848 |
2001-11-30 | 0.001619 |
Notice I have two datasets: stock_return
that includes monthly* stock returns and r_f
that includes the monthly risk-free rate. I would like to join them in one table...*
Merging Two Datasets Using Python¶
When we have two tables and want to add them together, we can use a function in pandas called "merge". This function has so many configurations on how to join 2 tables. But since both of our tables (stock_return, r_f) have the same index name (date), the match can be done quite easily.
When doing a merge, you can do "left", "right", "inner", and "outer". See the figure below. Imagine SP01-SP04 are dates. For our case, the r_f table starts in July, 2001. However, the stock_return starts in January, 2001. So when we do a "left" merge, we will have the original rows in stock_return plus what is matched in r_f, and the rest of the rows with no rates from r_f
will be set as missing.
# I will create a table (main_data) which will have all the stock returns + R_f
main_data = stock_return.merge(r_f, how= 'left', on='Date')
main_data.head(n=20)
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | ^W5000 | Risk Free | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||||
2000-02-29 | 0.066796 | -0.169944 | -0.106876 | -0.063683 | -0.003817 | 0.033823 | 0.142136 | 0.043815 | -0.153428 | -0.118619 | -0.058078 | -0.086845 | 0.726813 | 0.021192 | NaN |
2000-03-31 | -0.027223 | 0.027196 | 0.248125 | 0.213236 | -0.038314 | 0.115744 | 0.167938 | 0.028577 | -0.034704 | 0.151618 | 0.010757 | 0.188811 | 0.320068 | 0.058114 | NaN |
2000-04-30 | -0.176306 | 0.049587 | -0.079108 | 0.057576 | 0.071714 | -0.066667 | -0.038844 | -0.058252 | 0.010433 | -0.195266 | -0.021877 | -0.343529 | 0.054929 | -0.052775 | NaN |
2000-05-31 | -0.124575 | -0.015748 | 0.085904 | -0.032951 | 0.078067 | 0.140109 | -0.016757 | 0.252577 | 0.129630 | 0.132353 | -0.010101 | -0.103047 | 0.280505 | -0.036091 | NaN |
2000-06-30 | -0.248383 | 0.074325 | -0.076026 | -0.080000 | 0.008687 | -0.060264 | 0.072445 | -0.012356 | 0.076112 | -0.123377 | -0.025796 | 0.278721 | 0.113911 | 0.043327 | NaN |
2000-07-31 | -0.170396 | 0.167414 | -0.068718 | -0.006441 | -0.122958 | -0.046391 | -0.001402 | -0.128151 | 0.070924 | -0.287037 | 0.085091 | -0.127344 | -0.056047 | -0.021162 | NaN |
2000-08-31 | 0.377593 | 0.098913 | 0.070030 | 0.010129 | 0.000980 | -0.077027 | 0.121722 | -0.106024 | -0.141438 | 0.148701 | 0.032789 | 0.000000 | 0.322917 | 0.071246 | NaN |
2000-09-30 | -0.073795 | 0.205923 | 0.016603 | -0.018050 | -0.044074 | 0.275952 | -0.444732 | 0.053983 | 0.047195 | -0.054834 | -0.014391 | -0.136079 | 0.031496 | -0.046688 | NaN |
2000-10-31 | -0.047154 | 0.051356 | -0.036657 | -0.063726 | -0.122951 | 0.003464 | 0.082706 | 0.049096 | 0.098729 | 0.246412 | 0.060357 | 0.141969 | -0.241031 | -0.021938 | NaN |
2000-11-30 | -0.325939 | 0.018433 | -0.003044 | -0.191972 | 0.121496 | 0.164557 | -0.154167 | -0.029557 | 0.037266 | -0.063340 | 0.033635 | -0.166969 | -0.348252 | -0.100524 | NaN |
2000-12-31 | -0.369620 | -0.042287 | 0.039316 | 0.000000 | 0.100101 | 0.022558 | -0.209843 | 0.076168 | -0.024201 | 0.147541 | 0.213949 | -0.244008 | -0.190972 | 0.016670 | NaN |
2001-01-31 | 0.112450 | -0.113637 | -0.013708 | 0.059519 | -0.055555 | -0.075728 | 0.230769 | -0.001905 | -0.048205 | 0.273143 | -0.081743 | 0.407781 | 0.575584 | 0.037408 | NaN |
2001-02-28 | -0.411552 | 0.063248 | 0.028578 | 0.016420 | 0.047188 | 0.077143 | -0.228040 | 0.014886 | -0.085689 | 0.085054 | 0.018979 | -0.033777 | -0.134383 | -0.095482 | NaN |
2001-03-31 | 0.004172 | -0.101649 | 0.032789 | -0.075929 | -0.117901 | 0.086283 | -0.078258 | 0.026145 | -0.148406 | -0.140642 | -0.073610 | -0.073093 | 0.452797 | -0.068221 | NaN |
2001-04-30 | 0.542522 | 0.109316 | 0.099772 | 0.057692 | 0.150455 | -0.128534 | 0.174726 | -0.056604 | 0.026495 | 0.034416 | 0.145428 | 0.238857 | 0.283081 | 0.081404 | NaN |
2001-05-31 | 0.057668 | 0.017638 | -0.005282 | 0.045289 | 0.003649 | 0.003807 | -0.126172 | 0.047843 | 0.026196 | 0.042345 | -0.003613 | 0.021107 | 0.027731 | 0.008492 | NaN |
2001-06-30 | -0.152187 | -0.113585 | -0.051318 | -0.086338 | -0.076033 | 0.022514 | 0.083610 | 0.095974 | -0.050632 | -0.051339 | -0.032869 | 0.055218 | 0.083401 | -0.017491 | NaN |
2001-07-31 | -0.117314 | 0.052697 | 0.009834 | -0.087920 | -0.017161 | -0.021877 | 0.019145 | 0.036897 | -0.004988 | -0.091765 | -0.019456 | -0.093287 | -0.127763 | -0.017414 | 0.002960 |
2001-08-31 | -0.284227 | -0.125235 | -0.007002 | -0.034914 | 0.075889 | 0.068258 | -0.062059 | 0.063851 | 0.091256 | -0.059326 | -0.069539 | -0.138087 | 0.047096 | -0.061865 | 0.002855 |
2001-09-30 | -0.332215 | -0.343707 | -0.059488 | -0.267794 | 0.083021 | 0.018718 | -0.268500 | -0.054849 | -0.037395 | -0.223355 | -0.049462 | -0.103068 | -0.351434 | -0.090550 | 0.002166 |
# Let's change the format of the Date index to not include the day
# Only year (Y) and month (m)
main_data.index = main_data.index.strftime('%Y-%m')
main_data.head()
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | ^W5000 | Risk Free | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||||
2000-02 | 0.066796 | -0.169944 | -0.106876 | -0.063683 | -0.003817 | 0.033823 | 0.142136 | 0.043815 | -0.153428 | -0.118619 | -0.058078 | -0.086845 | 0.726813 | 0.021192 | NaN |
2000-03 | -0.027223 | 0.027196 | 0.248125 | 0.213236 | -0.038314 | 0.115744 | 0.167938 | 0.028577 | -0.034704 | 0.151618 | 0.010757 | 0.188811 | 0.320068 | 0.058114 | NaN |
2000-04 | -0.176306 | 0.049587 | -0.079108 | 0.057576 | 0.071714 | -0.066667 | -0.038844 | -0.058252 | 0.010433 | -0.195266 | -0.021877 | -0.343529 | 0.054929 | -0.052775 | NaN |
2000-05 | -0.124575 | -0.015748 | 0.085904 | -0.032951 | 0.078067 | 0.140109 | -0.016757 | 0.252577 | 0.129630 | 0.132353 | -0.010101 | -0.103047 | 0.280505 | -0.036091 | NaN |
2000-06 | -0.248383 | 0.074325 | -0.076026 | -0.080000 | 0.008687 | -0.060264 | 0.072445 | -0.012356 | 0.076112 | -0.123377 | -0.025796 | 0.278721 | 0.113911 | 0.043327 | NaN |
Returns over time¶
Let us look at how stock returns were behaving over our time frame.
# I will use a style called "bmh" to make a plot for how monthly returns change over time
with plt.style.context('bmh'):
main_data[["HSY", "NVDA", "^W5000", "Risk Free"]].plot(title = 'Monthly Returns',
xlabel = "Years",
ylabel = "Return in %",
grid=True,
figsize=(10,7),
lw=1.5)
plt.show()
Notice some stocks have a lot of jumps; meaning that in some months they have very high returns or very low returns. On the opposite side is the Risk-free return. It is basically stable over time and has not changed a lot.
Descriptive Statistics¶
Lets look at the monthly stock returns a bit deeper.
# Descriptive Stats
main_data.describe()
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | ^W5000 | Risk Free | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 287.000000 | 270.000000 |
mean | 0.021791 | 0.012425 | 0.009857 | 0.006993 | 0.008019 | 0.011105 | 0.006905 | 0.006957 | 0.006123 | 0.010586 | 0.007103 | 0.012066 | 0.037451 | 0.005589 | 0.001119 |
std | 0.129926 | 0.095598 | 0.069003 | 0.076331 | 0.114357 | 0.057571 | 0.097305 | 0.049571 | 0.050752 | 0.129830 | 0.060703 | 0.081098 | 0.174042 | 0.045973 | 0.001305 |
min | -0.411552 | -0.454664 | -0.214625 | -0.267794 | -0.381056 | -0.177236 | -0.444732 | -0.138636 | -0.172743 | -0.628874 | -0.157037 | -0.343529 | -0.486551 | -0.177406 | 0.000000 |
25% | -0.048718 | -0.042805 | -0.029320 | -0.034121 | -0.065293 | -0.025150 | -0.047196 | -0.021066 | -0.024001 | -0.061762 | -0.028618 | -0.037006 | -0.054362 | -0.021149 | 0.000058 |
50% | 0.021677 | 0.015899 | 0.008623 | 0.005748 | 0.007592 | 0.008660 | 0.007672 | 0.009468 | 0.008546 | 0.011221 | 0.010560 | 0.017244 | 0.030835 | 0.011240 | 0.000726 |
75% | 0.087617 | 0.070993 | 0.046279 | 0.050941 | 0.072783 | 0.044363 | 0.062263 | 0.035843 | 0.038312 | 0.077280 | 0.039402 | 0.055028 | 0.122156 | 0.034368 | 0.001725 |
max | 0.621776 | 0.459312 | 0.269666 | 0.248734 | 0.426362 | 0.275952 | 0.337429 | 0.252577 | 0.141928 | 0.644122 | 0.213949 | 0.407781 | 0.826237 | 0.133806 | 0.004321 |
We can see NVDA has the highest average monthly return at 3.75%. And the Market index ^W5000 has the lowest average monthly return at 0.56%.
Notice also that NVDA has the highest standard deviation $\sigma$, meaning the highest risk among our list of stocks.
Now with 14 securities + the T-bill, it is quite difficult to know which one would be a good stock to invest in. Because some have high returns but also are accompanied with high risk (such as NVDA), and some have a lot lower average monthly return but also a lower risk as well (such as the market index).
$$Sharpe = \frac{E(R_{i}) - R_{f}}{\sigma_{i}}$$One approach to evaluate these stocks to a standard is to measure the Sharpe Ratio:
The Sharpe ratio basically measures what is the return you receive for each unit of risk you are taking. An investment that gives you a higher Sharpe ratio (Meaning the highest return per unit of risk) is an investment that gives you a better deal overall.
# Sharpe ratio for DIS
sharpe = (main_data['DIS'].mean() - main_data['Risk Free'].mean())/(main_data['DIS'].std())
print(sharpe)
0.07695402859679135
# Sharpe ratio for BA
sharpe = (main_data['BA'].mean() - main_data['Risk Free'].mean())/(main_data['BA'].std())
print(sharpe)
0.11827015584105748
# Let us go through all the stocks and print their sharpe ratios,
# but instead of writing the code above 8 time, I will make a (loop)
# A loop means to go through each column (except the last one because it
# is the return on T-bill "risk-free return") and apply the same set of commands:
# for each column (x) in the table main_data
for x in main_data.columns:
# if the column is not named "Risk Free" then do the following (not equal is "!=")
if x != "Risk Free":
# Measure the sharpe ratio for the column x
sharpe = (main_data[x].mean() - main_data['Risk Free'].mean())/(main_data[x].std())
# Print the word "Sharpe Ratio for", then print the column name (x), then the variable sharpe
print("Sharpe Ratio for " + x + ":", sharpe)
Sharpe Ratio for AMZN: 0.1591046792742003 Sharpe Ratio for BA: 0.11827015584105748 Sharpe Ratio for CVX: 0.1266368107683178 Sharpe Ratio for DIS: 0.07695402859679135 Sharpe Ratio for GOLD: 0.060335956535745645 Sharpe Ratio for HSY: 0.17346395641798584 Sharpe Ratio for INTC: 0.05946030385105466 Sharpe Ratio for K: 0.1177715887576876 Sharpe Ratio for KO: 0.09859777276789806 Sharpe Ratio for M: 0.0729214628804487 Sharpe Ratio for MMM: 0.09857869309736234 Sharpe Ratio for MSFT: 0.13498231770596905 Sharpe Ratio for NVDA: 0.20875536329487002 Sharpe Ratio for ^W5000: 0.09723889901728328
Let me return to the descriptive statistics table. I would like first to save the table so I can manipulate it by adding columns or rows:
# First, save the results in a dataframe table
descriptive = main_data.describe()
# Second, transpose the table (i.e., flip it) & save it using the same name
descriptive = descriptive.transpose()
descriptive
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
AMZN | 287.0 | 0.021791 | 0.129926 | -0.411552 | -0.048718 | 0.021677 | 0.087617 | 0.621776 |
BA | 287.0 | 0.012425 | 0.095598 | -0.454664 | -0.042805 | 0.015899 | 0.070993 | 0.459312 |
CVX | 287.0 | 0.009857 | 0.069003 | -0.214625 | -0.029320 | 0.008623 | 0.046279 | 0.269666 |
DIS | 287.0 | 0.006993 | 0.076331 | -0.267794 | -0.034121 | 0.005748 | 0.050941 | 0.248734 |
GOLD | 287.0 | 0.008019 | 0.114357 | -0.381056 | -0.065293 | 0.007592 | 0.072783 | 0.426362 |
HSY | 287.0 | 0.011105 | 0.057571 | -0.177236 | -0.025150 | 0.008660 | 0.044363 | 0.275952 |
INTC | 287.0 | 0.006905 | 0.097305 | -0.444732 | -0.047196 | 0.007672 | 0.062263 | 0.337429 |
K | 287.0 | 0.006957 | 0.049571 | -0.138636 | -0.021066 | 0.009468 | 0.035843 | 0.252577 |
KO | 287.0 | 0.006123 | 0.050752 | -0.172743 | -0.024001 | 0.008546 | 0.038312 | 0.141928 |
M | 287.0 | 0.010586 | 0.129830 | -0.628874 | -0.061762 | 0.011221 | 0.077280 | 0.644122 |
MMM | 287.0 | 0.007103 | 0.060703 | -0.157037 | -0.028618 | 0.010560 | 0.039402 | 0.213949 |
MSFT | 287.0 | 0.012066 | 0.081098 | -0.343529 | -0.037006 | 0.017244 | 0.055028 | 0.407781 |
NVDA | 287.0 | 0.037451 | 0.174042 | -0.486551 | -0.054362 | 0.030835 | 0.122156 | 0.826237 |
^W5000 | 287.0 | 0.005589 | 0.045973 | -0.177406 | -0.021149 | 0.011240 | 0.034368 | 0.133806 |
Risk Free | 270.0 | 0.001119 | 0.001305 | 0.000000 | 0.000058 | 0.000726 | 0.001725 | 0.004321 |
# I will use the same risk free rate assumption as my Excel example
risk_free = 0.004
# Add a column to this newly created table and call it "sharpe"
descriptive['sharpe'] = (descriptive["mean"] - risk_free) / descriptive["std"]
# I would like to drop the last row; which is indexed as "Risk Free"
descriptive.drop(index = 'Risk Free', inplace= True)
descriptive
count | mean | std | min | 25% | 50% | 75% | max | sharpe | |
---|---|---|---|---|---|---|---|---|---|
AMZN | 287.0 | 0.021791 | 0.129926 | -0.411552 | -0.048718 | 0.021677 | 0.087617 | 0.621776 | 0.136929 |
BA | 287.0 | 0.012425 | 0.095598 | -0.454664 | -0.042805 | 0.015899 | 0.070993 | 0.459312 | 0.088132 |
CVX | 287.0 | 0.009857 | 0.069003 | -0.214625 | -0.029320 | 0.008623 | 0.046279 | 0.269666 | 0.084883 |
DIS | 287.0 | 0.006993 | 0.076331 | -0.267794 | -0.034121 | 0.005748 | 0.050941 | 0.248734 | 0.039208 |
GOLD | 287.0 | 0.008019 | 0.114357 | -0.381056 | -0.065293 | 0.007592 | 0.072783 | 0.426362 | 0.035141 |
HSY | 287.0 | 0.011105 | 0.057571 | -0.177236 | -0.025150 | 0.008660 | 0.044363 | 0.275952 | 0.123418 |
INTC | 287.0 | 0.006905 | 0.097305 | -0.444732 | -0.047196 | 0.007672 | 0.062263 | 0.337429 | 0.029851 |
K | 287.0 | 0.006957 | 0.049571 | -0.138636 | -0.021066 | 0.009468 | 0.035843 | 0.252577 | 0.059650 |
KO | 287.0 | 0.006123 | 0.050752 | -0.172743 | -0.024001 | 0.008546 | 0.038312 | 0.141928 | 0.041828 |
M | 287.0 | 0.010586 | 0.129830 | -0.628874 | -0.061762 | 0.011221 | 0.077280 | 0.644122 | 0.050730 |
MMM | 287.0 | 0.007103 | 0.060703 | -0.157037 | -0.028618 | 0.010560 | 0.039402 | 0.213949 | 0.051115 |
MSFT | 287.0 | 0.012066 | 0.081098 | -0.343529 | -0.037006 | 0.017244 | 0.055028 | 0.407781 | 0.099455 |
NVDA | 287.0 | 0.037451 | 0.174042 | -0.486551 | -0.054362 | 0.030835 | 0.122156 | 0.826237 | 0.192201 |
^W5000 | 287.0 | 0.005589 | 0.045973 | -0.177406 | -0.021149 | 0.011240 | 0.034368 | 0.133806 | 0.034567 |
# Example to show the usefulness of this table:
# Finding the stock with the highest sharpe ratio:
print('The ticker with the highest Sharpe ratio: \n', descriptive['sharpe'].idxmax(), '\t', descriptive['sharpe'].max())
# Finding the stock with the highest average return:
print('The ticker with the highest expected return: \n', descriptive['mean'].idxmax(),'\t', descriptive['mean'].max())
# Finding the stock with the highest risk:
print('The ticker with the highest estimated risk: \n', descriptive['std'].idxmax(), '\t', descriptive['std'].max())
The ticker with the highest Sharpe ratio: NVDA 0.19220082774983946 The ticker with the highest expected return: NVDA 0.037450927565498095 The ticker with the highest estimated risk: NVDA 0.1740415374747315
As can be seen from the results, the highest Sharpe ratio is found in NVDA. Which means that it is a good stock to invest in.
QUESTION: Can I have a higher Sharpe ratio than the ones offered by these 14 tickers? Let me examine this question by creating a portfolio that invests 30% in HSY and 70% in NVDA:
Measuring Portfolio Returns¶
The return for any portfolio consisting of $n$ stocks: $$R_{p} = \sum^{n}_{i=1} w_{i}\times R_{i}$$
where $w$ is the weight for any stock $i$.
Also remember that weights have to sum up to 1: $$\sum^{n}_{i=1} w_{i} = 1$$
main_data['Port_1'] = (0.3 * main_data['HSY']) + (0.7 * main_data['NVDA'])
main_data[['Port_1']].describe()
Port_1 | |
---|---|
count | 287.000000 |
mean | 0.029547 |
std | 0.122414 |
min | -0.358342 |
25% | -0.038771 |
50% | 0.024024 |
75% | 0.091932 |
max | 0.605263 |
Note Port_1 has an estimated return of 2.95% and a risk of 12.24%.
I would like to see the histogram of returns for this portfolio, and compare it with its components (HSY & NVDA). The steps for creating a histogram is similar to the steps for creating a plot:
# We will use a theme called "classic"
with plt.style.context('classic'):
# Note how I can customize the histogram by choosing the number of bins and the color of the edges
hist = main_data['Port_1'].plot.hist(bins = 15, ec="black")
hist.set_xlabel("Monthly returns (%)")
hist.set_ylabel("Count")
hist.set_title("Portfolio 1 Monthly Returns Histogram")
#plt.savefig('Port_1_histogram.png', dpi = 400)
plt.show()
with plt.style.context('ggplot'):
hist = main_data['HSY'].plot.hist(bins = 15, ec="black")
hist.set_xlabel("Monthly returns (%)")
hist.set_ylabel("Count")
hist.set_title("HSY Monthly Returns Histogram")
#plt.savefig('HSY_histogram.png', dpi = 400)
plt.show()
with plt.style.context('ggplot'):
hist = main_data['NVDA'].plot.hist(bins = 15, ec="black")
hist.set_xlabel("Monthly returns (%)")
hist.set_ylabel("Count")
hist.set_title("NVDA Monthly Returns Histogram")
#plt.savefig('NVDA_histogram.png', dpi = 400)
plt.show()
# I will plot to histogram in one picture to compare the distribution of returns
with plt.style.context('ggplot'):
# first histogram is the NVDA stock, I choose the color blue for it
hist = main_data['NVDA'].plot.hist(color = 'blue', bins = 20, ec="black")
# first histogram is the Port_1, I choose the color red for it.
# Notice I also made this histogram's transparency set to 70%. That way I can see any information behind the bars
hist = main_data['Port_1'].plot.hist(color = 'red', bins = 20, ec="black", alpha = 0.7)
hist.set_xlabel("Monthly returns (%)")
hist.set_ylabel("Count")
hist.set_title("Portfolio 1 vs. NVDA Monthly Returns Histogram")
# I asked to show the definition for each color
hist.legend()
#plt.savefig('Port_1_histogram.png', dpi = 400)
plt.show()
From the graph above, I can conclude that the distribution of returns for Port_1 is closer to the mean (centered in the middle) when compared to the returns of NVDA. This means a lower risk!
# Let us examine its Sharpe ratio
sharpe = (main_data['Port_1'].mean() - risk_free)/(main_data['Port_1'].std())
print(sharpe)
0.20869601153997158
As can be seen from this result, I can have a higher Sharpe ratio than HSY or NVDA alone if I made a portfolio investing 30% in HSY and 70% in NVDA; 0.21 vs. 0.19 or 0.12.
Because of diversification, I can create an investment with lower risk (meaning lower $\sigma$), making the denominator in the Sharpe ratio formula smaller, which inturn makes the Sharpe ratio larger.
But why should I stop with two stocks? Maybe I should try to include more stocks in the portfolio, and also try different weights instead of 30% and 70%.
However, with my current setting, this would require me to make a column each time I change weights or add additional stocks. It basically becomes a guessing game...
Is there a way to ask the computer to try different weights on the 14 tickers until it finds a portfolio that has the highest Sharpe ratio? Yes there is a way. But before I jump into it, I need to review how portfolio risk is actually estimated, and also *examine how stock returns behave when they are compared with each other.*
Calculating Portfolio Risk¶
The risk of any portfolio can be measured as the standard deviation of the portfolio return $\sigma_{p}$. However, if the portfolio consists of 2 securities, the standard deviation of the portfolio will not be simply the weighted average of the standard deviation for the two securities.
We need to consider the covariance/correlation between the securities. The covariance reflects the co-movement of returns between the two stocks. Unless the two assets are perfectly positively correlated, the covariance will reduce the overall risk of the portfolio. $$ \rho_{i,j} \,\, or \,\, Corr(R_{i}, R_{j}) = \frac{Cov(R_{i}, R_{j}) \, or\, \sigma_{i,j}}{\sigma_{i}\sigma_{j}} $$
The Variance for any portfolio consisting of $N$ stocks: $$ \sigma^{2}_{p} = \sum^{N}_{j=1} w_{j} \sum^{N}_{i=1} w_{i}\sigma_{ij} $$
Then the portfolio risk* becomes*: $$ \sigma_{p} = \sqrt{\sigma^{2}_{p}} $$
So for our example of Port_1, we have 2 stocks; stock 1 (HSY), and stock 2 (NVDA). Applying the formula:
$$\sigma^{2}_{Port(1)} = w_{1}w_{1}\sigma_{1,1} + w_{1}w_{2}\sigma_{1,2} + w_{2}w_{2}\sigma_{2,2} + w_{2}w_{1}\sigma_{2,1}$$$$\sigma^{2}_{Port(1)} = [0.3 \times 0.3 \times \sigma_{HSY,HSY}] + [0.3 \times 0.7 \times \sigma_{HSY,NVDA}] + [0.7 \times 0.7 \times \sigma_{NVDA,NVDA}] + [0.7 \times 0.3 \times \sigma_{NVDA,HSY}] $$Since:
The covariance of stock 1 with itself is the variance ($\sigma_{1,1} = \sigma^{2}_{1}$)
According to the correlation equation above: $\sigma_{2,1} = \sigma_{1,2} = \rho_{1,2} \sigma_{1} \sigma_{2}$
The equation becomes: $$ \sigma^{2}_{Port(1)} = w^{2}_{1}\sigma^{2}_{1} + w^{2}_{2}\sigma^{2}_{2} + 2 \times (w_{1}w_{2}\sigma_{1}\sigma_{2} \rho_{1,2}) $$
And Port_1's risk is:
$$ \sigma_{Port(1)} = \sqrt{\sigma^{2}_{Port(1)}} $$Using Matrices¶
Now for portfolio 1 the above calculations can be relativity easy to calculate. But when I have 3 stocks, or 10 stocks, measuring a portfolio risk can be a very long and tiring equation. Fortunately, I can use matrices in python to make these calculations fast, easy, and consistent.
If I "translate" the above portfolio variance equation into matrices. It will take the following form:
$$ \sigma^{2}_{p} = \begin{bmatrix} w_{1} & w_{2} & \dots & w_{n}\end{bmatrix} \begin{bmatrix} \sigma^{2}_{1} & \sigma_{1,2} & \dots & \sigma_{1,n} \\ \sigma_{2,1} & \sigma^{2}_{2} & \dots & \sigma_{2,n} \\ \dots & &\ddots & \vdots\\ \sigma_{n,1} & \sigma_{n,2} & \dots & \sigma^{2}_{n}\end{bmatrix} \begin{bmatrix} w_{1} \\ w_{2} \\ \vdots \\ w_{n}\end{bmatrix} $$Now let me use matrix here to measure the standard deviation of a new portfolio (call it "Portfolio 2") where I invest 20% in Nvidia (NVDA), 30% in Microsoft (MSFT), and 50% in CocaCola (KO):
# list for the portfolio weights
w_list =[0.2, 0.3, 0.5]
# The package (Numpy) will be very helpful here
# (notice I gave it a nickname "np" at the beginning of this notebook)
# It can deal with matrix multiplication and create random numbers if I want
# put the list of weights in a matrix
w = np.array([w_list])
print(w)
[[0.2 0.3 0.5]]
# Make sure the weights always sum up to 1.
# Here I write a statement that the sum of all the elements in a matrix is equal to 1.
# The computer will tell me if that statement is true or false
np.sum(w) == 1
True
I need a 3 by 3 matrix of covariance for the stocks I want to make a portfolio from ('NVDA','MSFT','KO'). To do that, I first take a slice of the (main_data) which includes only the stocks I care about now. Then save it using a new name (slice)
slice = main_data[['NVDA', 'MSFT', 'KO']]
# Lets see this table
slice.head()
NVDA | MSFT | KO | |
---|---|---|---|
Date | |||
2000-02 | 0.726813 | -0.086845 | -0.153428 |
2000-03 | 0.320068 | 0.188811 | -0.034704 |
2000-04 | 0.054929 | -0.343529 | 0.010433 |
2000-05 | 0.280505 | -0.103047 | 0.129630 |
2000-06 | 0.113911 | 0.278721 | 0.076112 |
# The covariance table for the dataset (slice)
covariance = slice.cov()
print(covariance)
NVDA MSFT KO NVDA 0.030290 0.005527 0.000217 MSFT 0.005527 0.006577 0.001010 KO 0.000217 0.001010 0.002576
# Take the covariance table and put it in a matrix
# I will reuse the same variable name (it will overwrite)
covariance = np.array(covariance)
covariance
array([[0.03029046, 0.00552746, 0.00021734], [0.00552746, 0.00657686, 0.00100997], [0.00021734, 0.00100997, 0.00257576]])
# Now I have matrix 1 and 2, whats left is matrix 3
# matrix 3 is basically the "transpose" of matrix 1
# Let me create it
w_vertical= w.T
print(w_vertical)
[[0.2] [0.3] [0.5]]
Now I apply the above matrix formula. It will look like this:
$$\sigma^{2}_{p} = \begin{bmatrix} w_{NVDA} & w_{MSFT} & w_{KO}\end{bmatrix} \begin{bmatrix} \sigma^{2}_{NVDA} & \sigma_{NVDA,MSFT} & \sigma_{NVDA,KO} \\ \sigma_{MSFT,NVDA} & \sigma^{2}_{MSFT} & \sigma_{MSFT,KO} \\ \sigma_{KO,NVDA} & \sigma_{KO,MSFT} & \sigma^{2}_{KO}\end{bmatrix} \begin{bmatrix} w_{NVDA} \\ w_{MSFT} \\ w_{KO}\end{bmatrix} $$# The way matrix multiplication is done in Python is through the use of the function (.dot) in Numpy
# First matrix times the 2nd matrix
first = w.dot(covariance)
# The answer is then multiplied by the 3rd matrix
second = first.dot(w_vertical)
# Alternatively, embed the first onto the second
# The code is more efficient that way ;)
port_cov = w.dot(covariance).dot(w_vertical)
print(port_cov)
[[0.00345723]]
# Take out the answer from the matrix:
port_cov[0,0]
0.003457231185821661
# Thus, the risk for this portfolio is the square root of the calculated variance:
port2_risk = port_cov[0,0] **0.5
port2_risk
0.05879822434242093
Thus, the risk of Port_2 is about 5.87% and it's return (using the data_main[column name].mean()
method) is:
port2_return = (main_data['NVDA'].mean()* 0.2)+(main_data['MSFT'].mean()*0.3)+(main_data['KO'].mean()* 0.5)
port2_return
0.014171290926480557
Measuring the Sharpe Ratio:
port2_sharpe = (port2_return - risk_free)/port2_risk
port2_sharpe
0.1729863620922701
So portfolio 2 is not an improvement over portfolio 1...
Exporting Datasets¶
The pandas
package offers its users a way to export their tables and datasets to any type of file. It also offers options to import files into a new dataset. In the following code I will export the dataset I have created earlier (main_data
) to an Excel file. The produced Excel file will be located in the same folder where this Jupyter Notebook exists.
main_data.to_excel("returns_cleaned.xlsx")