Portfolio Construction & Measuring Risk Using the Markowitz Method¶

This notebook will shed some light on the importance of diversification in investments.

First, using the pandas package, I will review the basic steps for collecting & measuring stock returns, as well as how to measure a portfolio return using specific weights. After that, I will show an example of how one can assess a portfolio's risk level using the Markowitz method, by introducing the concept of return correlations among securities. At the end of this notebook, you will learn how to estimate the risk of a portfolio of stocks.


Loading the necessary packages¶

When using Python, I recommend you make it a habit to load the needed packages for your session at the beginning of your code.

This step is similar to the steps taken in my previous Jupyter Notebook series. However, I will add another important package here: (fredapi). This package is made and managed by the Federal Reserve Bank of St. Louis in the United States. Using the package allows the user to download economic data directly from their server. It is basically a portal one can use to obtain any public data provided by the Fed. In order to use this feature, you are required to first open an account at their website, and then request an *API Key*. From my experience, the API-Key request is approved immediately. Check this link for more info: https://fred.stlouisfed.org/docs/api/api_key.html

In [ ]:
# If it is your first time using fredapi, you need to download it to your anaconda library. 
# You can do it through the anaconda program (like what you did with the package yfinance) 
# Or alternatively, run the following code here only once on your computer

%pip install fredapi
Collecting fredapi
  Obtaining dependency information for fredapi from https://files.pythonhosted.org/packages/96/d4/f81fa9f67775a6a4b9e2cd8487239d61a9698cb2b9c02a5a2897d310f7a4/fredapi-0.5.1-py3-none-any.whl.metadata
  Downloading fredapi-0.5.1-py3-none-any.whl.metadata (5.0 kB)
Requirement already satisfied: pandas in /Users/eyad/anaconda3/lib/python3.11/site-packages (from fredapi) (1.5.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/eyad/anaconda3/lib/python3.11/site-packages (from pandas->fredapi) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/eyad/anaconda3/lib/python3.11/site-packages (from pandas->fredapi) (2022.7)
Requirement already satisfied: numpy>=1.21.0 in /Users/eyad/anaconda3/lib/python3.11/site-packages (from pandas->fredapi) (1.24.3)
Requirement already satisfied: six>=1.5 in /Users/eyad/anaconda3/lib/python3.11/site-packages (from python-dateutil>=2.8.1->pandas->fredapi) (1.16.0)
Downloading fredapi-0.5.1-py3-none-any.whl (11 kB)
Installing collected packages: fredapi
Successfully installed fredapi-0.5.1
In [1]:
# Use the package fredapi if you have installed it
from fredapi import Fred
# Don't forget to use your api-key after opening account with FRED
# So then you can download any data from their server
fred = Fred(api_key = '44303cd2e2752fc88d6080b1e0d9d1e9')

# Download the same libraries we used in the past
import yfinance as yf, pandas as pd, datetime as dt, matplotlib as mpl, matplotlib.pyplot as plt

# This library is helpful to use for many math applications, such as matrix multiplication and 
# creating random numbers
import numpy as np

Collecting Monthly Returns¶

Let us use the package yfinance to download monthly prices (so I can measure monthly returns) for the same list of stocks:

["BA", "AMZN", "CVX", "DIS", "GOLD", "HSY", "INTC", "K", "KO", "M", "MMM", "MSFT", "NVDA"].

For a later comparison, I would like to download historical data on an index that hopefully represents the performance of the overall market during the same period. One appropriate option is the Wilshire 5000 Total Market Index, it's ticker in yfinance is ^W5000.

Furthermore, I would like to obtain data on historical T-bills so I can measure the risk-free rate ($R_{f}$). T-bill monthly returns data will be obtained from FRED. The code used by FRED for the 4-week-maturity treasury bill is TB4WK. If I am working with annual or daily stock returns for example, I probably will need to download something different from FRED. This is because the risk-free rate has to be for the same investment horizon I am analyzing. (In my case it is monthly)

Recall the U.S. Federal Reserve Bank sells T-bill securities that have different maturities during the same time; 4, 13, 26, and 52 weeks. Ideally, I would like to measure the return on a T-bill investment that has a maturity matching my investment horizon (Again, for my analysis its 1 month). So for example if I am examining the return of AMZN during the month of April 2016, I would like to compare the stock's performance with a T-bill security that was issued by the fed at the beginning of April 2016 and has a maturity of 1 month.

But what if there are no T-bills sold by the Federal Bank that has a maturity matching my investment horizon? Say if my investment horizon is 1 week, what can I do?

Well, to answer this question, remember that T-bills are money market instruments, which means they are financial assets sold and bought in the market on a daily basis. Thus, one can find approximate the return on a T-bill investment of 1 week by looking at the price of a T-bill today then checking its price after 1 week. The holding period return (HPR) for this investment can represent the 1-week risk-free rate: $$R_{f, t} = \frac{P_{1} - P_{0}}{P_{0}}$$

Because T-bills have no risk, we know for certainty that at maturity the investor receives face value. If a 4-week T-bill was issued by the fed and sold today for $950, this means the asset will pay the holder of the security $1000 in 4 weeks ($50 profit in 4 weeks). If the investor sells this T-bill after 3 weeks, and assuming interest rates on new T-bills have not changed, then logically speaking (and ignoring compounding for a moment), the seller should be entitled for only 3 weeks worth of profits. That is 3/4 of the $50 which is $37.5. Anyone buying this T-bill and holds it for an additional week should receive 1/4 of the $50 ($12.5). Thus, a buyer should be willing to pay ($1000 - $12.5 = $987.5) today, and receives the face value after one week with certainty.

To put it in another way, if a 4-week T-bill has a return of 4%, then holding this security for 3 weeks should provide the investor (3/4 of the 4% = 3%) return. More importantly, any two short-term T-bills having different maturities should at least give the investor a similar return when buying and holding them for the same period. For example, investor A buys a T-bill today having a maturity of 4 weeks and sells it after 1 week. Investor B buys a T-bill today having a maturity of 3 months and sells it after 1 week. Because both investors are investing in an asset that has the same level of risk (risk = 0) and are held for the same duration (1 week), they should have the same return. This is assuming there are no changes in the interest rates, inflation, or any other sources of risk. The conclusion we have just reached can be helpful to approximate the risk-free rate for any investment horizon we are analyzing.

Remember, The quotes we see on T-bills are annualized using the bank-discount method, which has two main flaws: A) Assumes the year has 360 days, and B) Computes the yield as a fraction of face value instead of the current price. Thus, to find the actual risk-free rate, our first task is to transform the annual yield we get from a bank-discount method to an annual bond-equivalent yield (BD to BEY). This is done by first finding the price of the T-bill: $$Price = Face \,\, Value \times \left[ 1- (R_{BD} \times \frac{n}{360}) \right] $$ Where n is the number of days until the bill matures. Once we find the price, the BEY is basically: $$R_{BEY} = \frac{Face \,\, Value - Price}{Price} \times \frac{365}{n}$$

Think of BEY as a T-bill version of the popular APR. Meaning the calculated return is, again, in an annualized form and it ignores any compounding. Now, the simplest form of finding the risk-free return matching my investment horizon is to divide the BEY by the number of investment periods I have during the year. For example, for monthly analysis, I divide the BEY by 12, for weekly 52, for daily 365.

If I would like to b e precise and assume compounding, then I apply the following formula for finding the effective risk-free rate: $$R_{eff.} = \left[ 1 + R_{BEY} \right]^{\frac{1}{m}} -1$$ where $m$ is the divisor (monthly = 12, weekly = 52, yearly = 365 and so on...)

Again, because my investment horizon is monthly, I will use the convenient source: TB4WK. However, to check my conclusions above, I will also download the three-month T-bill data and transform them to monthly.

Note 1: If you need daily data on 1-month T-bills use: DTB4WK, for annual $R_{f}$ reported annually use: RIFSGFSY01NA, for monthly data on 3-month T-bills use: TB3MS

Note 2: The 4-week maturity T-bill data offered by FRED starts in mid 2001. If you want to collect monthly rates for older periods, you need to use an alternative way. You can collect T-bill prices for the three-month maturity bills* (TB3MS) (starts at the end of 1938) and find the monthly return using the approach I discussed above.

In [2]:
# The following are the list of tickers we would like to download from yfinance
ticker_list = ["^W5000", "BA", "AMZN", "CVX", "DIS", "GOLD", "HSY", "INTC", "K", "KO", "M", "MMM", "MSFT", "NVDA"]

# Set the start and end date
start_date = dt.date(1999,12,31)
end_date = dt.date(2023,12,31)

# Instead of using a ticker, now we will pass the name of the list
stock_data = yf.download(ticker_list, start=start_date, end=end_date, interval='1mo')

# We only need the adj. close prices, so I "slice" the table and replace the original table name
stock_data = stock_data['Adj Close']

# I would like to adjust the index so I only see dates not date-time
# I will use a function to normalize the date-time data to dates only
stock_data.index = stock_data.index.normalize()


# the following code obtains the discount rate on 4-week maturity t-bills from FRED (reported monthly)
# we will save it in a variable called (r_f_4wk)
r_f_4wk = fred.get_series('TB4WK')

# the following code obtains the discount rates of 3-month maturity t-bills from FRED (reported monthly)
# we will save it in a variable called (r_f_3m)
r_f_3m= fred.get_series('TB3MS')
[*********************100%%**********************]  14 of 14 completed
In [3]:
# Recall I want monthly returns, not prices
# So I will take % change from last period of the whole dataset (stock_data). 
# That way it measures the return for all the columns in one line of code!

stock_return= stock_data.pct_change()

# Lets see it
stock_return.head()
Out[3]:
AMZN BA CVX DIS GOLD HSY INTC K KO M MMM MSFT NVDA ^W5000
Date
2000-01-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-02-01 0.066796 -0.169944 -0.106876 -0.063683 -0.003817 0.033823 0.142135 0.043814 -0.153428 -0.118618 -0.058077 -0.086847 0.726798 0.021192
2000-03-01 -0.027223 0.027196 0.248126 0.213235 -0.038314 0.115744 0.167939 0.028578 -0.034704 0.151618 0.010757 0.188813 0.320071 0.058114
2000-04-01 -0.176306 0.049587 -0.079108 0.057576 0.071713 -0.066666 -0.038845 -0.058253 0.010433 -0.195267 -0.021877 -0.343529 0.054929 -0.052775
2000-05-01 -0.124575 -0.015748 0.085903 -0.032951 0.078067 0.140110 -0.016757 0.252578 0.129629 0.132353 -0.010102 -0.103047 0.280503 -0.036091
In [4]:
# I do not need the first row (it is empty)
# I apply a function (dropna) on my table. Also I use the feature (inplace)
# Meaning to put this new table inplace the original one after dropping the missing values 
stock_return.dropna(inplace=True)

# Lets see it
stock_return.head()
Out[4]:
AMZN BA CVX DIS GOLD HSY INTC K KO M MMM MSFT NVDA ^W5000
Date
2000-02-01 0.066796 -0.169944 -0.106876 -0.063683 -0.003817 0.033823 0.142135 0.043814 -0.153428 -0.118618 -0.058077 -0.086847 0.726798 0.021192
2000-03-01 -0.027223 0.027196 0.248126 0.213235 -0.038314 0.115744 0.167939 0.028578 -0.034704 0.151618 0.010757 0.188813 0.320071 0.058114
2000-04-01 -0.176306 0.049587 -0.079108 0.057576 0.071713 -0.066666 -0.038845 -0.058253 0.010433 -0.195267 -0.021877 -0.343529 0.054929 -0.052775
2000-05-01 -0.124575 -0.015748 0.085903 -0.032951 0.078067 0.140110 -0.016757 0.252578 0.129629 0.132353 -0.010102 -0.103047 0.280503 -0.036091
2000-06-01 -0.248383 0.074325 -0.076026 -0.080000 0.008687 -0.060264 0.072446 -0.012356 0.076113 -0.123376 -0.025795 0.278721 0.113913 0.043327

Recall we used "Adj Close" prices to calculate our returns, and adjusted closing prices are the prices of securities at the end of the period (in our case, its the end of the month). Thus, the measured return for any given month should be the return at the end of the period not at the beginning of the period. Notice returns are reported at the beginning of the month. For example, stock returns for February 2000 is reported on (February 1, 2001). To fix this, I can use a command from pandas to push the date index to next month, then minus one day.

In [5]:
# Notice that the Date row in our table is the index
# So what we really want to do is modify our index by pushing it to the next month
# then reducing one day 
# This way we will have the last day of the current month
stock_return.index = stock_return.index + pd.DateOffset(months=1) + pd.DateOffset(days=-1) 
stock_return.head()
Out[5]:
AMZN BA CVX DIS GOLD HSY INTC K KO M MMM MSFT NVDA ^W5000
Date
2000-02-29 0.066796 -0.169944 -0.106876 -0.063683 -0.003817 0.033823 0.142135 0.043814 -0.153428 -0.118618 -0.058077 -0.086847 0.726798 0.021192
2000-03-31 -0.027223 0.027196 0.248126 0.213235 -0.038314 0.115744 0.167939 0.028578 -0.034704 0.151618 0.010757 0.188813 0.320071 0.058114
2000-04-30 -0.176306 0.049587 -0.079108 0.057576 0.071713 -0.066666 -0.038845 -0.058253 0.010433 -0.195267 -0.021877 -0.343529 0.054929 -0.052775
2000-05-31 -0.124575 -0.015748 0.085903 -0.032951 0.078067 0.140110 -0.016757 0.252578 0.129629 0.132353 -0.010102 -0.103047 0.280503 -0.036091
2000-06-30 -0.248383 0.074325 -0.076026 -0.080000 0.008687 -0.060264 0.072446 -0.012356 0.076113 -0.123376 -0.025795 0.278721 0.113913 0.043327
In [6]:
# The risk-free rates obtained from FRED are in a (column) form, or we call it "series"
# We need to make it a pandas table (DataFrame)
r_f_4wk = r_f_4wk.to_frame()

r_f_3m = r_f_3m.to_frame()



r_f_3m.head()
Out[6]:
0
1934-01-01 0.72
1934-02-01 0.62
1934-03-01 0.24
1934-04-01 0.15
1934-05-01 0.16
In [7]:
# normalize the format of the dates (like we did with the stock_data table)
r_f_4wk.index = r_f_4wk.index.normalize()

# Rename the index and call it "Date"
r_f_4wk.index.rename('Date', inplace = True)

# Note the rates are in whole numbers, not fractions. Let's adjust that:
r_f_4wk[0] = r_f_4wk[0]  / 100


# Do the same process for the yearly rates:
r_f_3m.index = r_f_3m.index.normalize()
r_f_3m.index.rename('Date', inplace = True)
r_f_3m[0] = r_f_3m[0]  / 100

# Lets see the table now
r_f_3m.head()
Out[7]:
0
Date
1934-01-01 0.0072
1934-02-01 0.0062
1934-03-01 0.0024
1934-04-01 0.0015
1934-05-01 0.0016
In [9]:
# I will measure the price using the formulas presented above
# Remember the maturity for this T-bill is 3-months. which means n = 90
r_f_3m['Price'] = 1000*(1-  (r_f_3m[0] * (90/360))  )
# Then I create a column measuring the BEY
r_f_3m['BEY'] = ((1000 - r_f_3m['Price'])/ r_f_3m['Price']) * (365/90)


# My monthly risk-free rate is dividing the BEY by 12 (I have 12 1-month periods during a year)
r_f_3m['r_f_simple'] = r_f_3m['BEY'] / 12


# For more accurate rate, I can assume compounding and use the effective monthly return
r_f_3m['r_f_eff'] = (1 + r_f_3m['BEY'])**(1/12) - 1

#lets see the table now
r_f_3m.head()
Out[9]:
0 Price BEY r_f_simple r_f_eff
Date
1934-01-01 0.0072 998.200 0.007313 0.000609 0.000607
1934-02-01 0.0062 998.450 0.006296 0.000525 0.000523
1934-03-01 0.0024 999.400 0.002435 0.000203 0.000203
1934-04-01 0.0015 999.625 0.001521 0.000127 0.000127
1934-05-01 0.0016 999.600 0.001623 0.000135 0.000135
In [10]:
# Now similar steps for the monthly rates:

# The maturity for these bills is 1 month, so n = 30
r_f_4wk['Price'] = 1000*(1-  (r_f_4wk[0] * (30/360))  )
# Then I create a column measuring the BEY
r_f_4wk['BEY'] = ((1000 - r_f_4wk['Price'])/ r_f_4wk['Price']) * (365/30)

# My monthly risk-free rate is dividing the BEY by 12
r_f_4wk['r_f_simple'] = r_f_4wk['BEY'] / 12
# For more accuracy, I can assume compounding and use the effective monthly return
r_f_4wk['r_f_eff'] = (1 + r_f_4wk['BEY'])**(1/12) - 1

#lets see the table now
r_f_4wk.head()
Out[10]:
0 Price BEY r_f_simple r_f_eff
Date
2001-07-01 0.0361 996.991667 0.036712 0.003059 0.003009
2001-08-01 0.0348 997.100000 0.035386 0.002949 0.002902
2001-09-01 0.0263 997.808333 0.026724 0.002227 0.002200
2001-10-01 0.0224 998.133333 0.022754 0.001896 0.001877
2001-11-01 0.0196 998.366667 0.019905 0.001659 0.001644
In [17]:
# Similar to the stock data, the dates here are reported at the beginning of the month
# that is because returns of fixed-income securities are known in advance 
# (rf rate for the next month is known in advance)
# I will apply the same method I did for the table (stock_return)
# by pushing the date index to the end of the month
# That way, I can say this was the return from the beginning to the end of the month

r_f_4wk.index = r_f_4wk.index + pd.DateOffset(months=1) + pd.DateOffset(days=-1) 

r_f_3m.index = r_f_3m.index + pd.DateOffset(months=1) + pd.DateOffset(days=-1) 

r_f_4wk.head()
Out[17]:
0 Price BEY r_f_simple r_f_eff
Date
2001-07-31 0.0361 996.991667 0.036712 0.003059 0.003009
2001-08-31 0.0348 997.100000 0.035386 0.002949 0.002902
2001-09-30 0.0263 997.808333 0.026724 0.002227 0.002200
2001-10-31 0.0224 998.133333 0.022754 0.001896 0.001877
2001-11-30 0.0196 998.366667 0.019905 0.001659 0.001644
In [29]:
# Let us check if there are differences between 
# the 4-week and the 3-month T-bills after 
# I adjusted them to my investment horizon

# I will obtain 1 row from the dataset by using a function from pandas 
# # named (loc). This allows me to specify the index number for the row I want

# Remember my index here dates, so I want the row matching the date: 1/31/2017

print(r_f_4wk.loc['2017-01-31'])

# I would like to compare it with the row from the 3-month T-bills
print(r_f_3m.loc['2017-01-31'])
0               0.004900
Price         999.591667
BEY             0.004970
r_f_simple      0.000414
r_f_eff         0.000413
Name: 2017-01-31 00:00:00, dtype: float64
0               0.005100
Price         998.725000
BEY             0.005177
r_f_simple      0.000431
r_f_eff         0.000430
Name: 2017-01-31 00:00:00, dtype: float64

Notice how the difference between the two rates are minimal (The simple monthly R-f for the 4-week maturity is 0.0414% and the 3-month maturity is 0.0431%), which means the 3-month is a good approximation for monthly risk-free rate after transforming it! You can check other periods to see whether the difference is big or small...


Merging Two Datasets Using Python¶

Notice I have two datasets: stock_return which includes monthly* stock returns and r_f_4wk which includes the monthly risk-free rate. I would like to join them in one table...*

When we have two tables and want to combine them, we can use a function in pandas called "merge". This function has so many configurations on how to join DataFrames. But since both of our tables (stock_return, r_f_4wk) have the same index name (date), the match can be done quite easily.

When doing a merge, you can do "left", "right", "inner", and "outer". See the figure below. Imagine SP01-SP04 are dates. For our case, the r_f_4wk table starts in July, 2001. However, the stock_return starts in February, 2000. So when we do a "left" merge, we will have the original rows in stock_return plus whatever wa can match in r_f_4wk, and the rest of the rows that cannot be matched with any row from r_f_4wk will be set as missing.

matching
In [35]:
# I will create a table (main_data) which will have all the stock returns + the columns I want from r_f_4wk
# In my case, I only want the column "r_f_eff", so I specify it as: r_f_4wk['r_f_eff']

main_data = stock_return.merge(r_f_4wk['r_f_eff'], how= 'left', on='Date')

# Here I am renaming my new column that I added to something readable 

main_data.rename(columns = {"r_f_eff": "Risk-free"}, inplace= True)


main_data.head(n=20)
Out[35]:
AMZN BA CVX DIS GOLD HSY INTC K KO M MMM MSFT NVDA ^W5000 Risk-free
Date
2000-02-29 0.066796 -0.169944 -0.106876 -0.063683 -0.003817 0.033823 0.142135 0.043814 -0.153428 -0.118618 -0.058077 -0.086847 0.726798 0.021192 NaN
2000-03-31 -0.027223 0.027196 0.248126 0.213235 -0.038314 0.115744 0.167939 0.028578 -0.034704 0.151618 0.010757 0.188813 0.320071 0.058114 NaN
2000-04-30 -0.176306 0.049587 -0.079108 0.057576 0.071713 -0.066666 -0.038845 -0.058253 0.010433 -0.195267 -0.021877 -0.343529 0.054929 -0.052775 NaN
2000-05-31 -0.124575 -0.015748 0.085903 -0.032951 0.078067 0.140110 -0.016757 0.252578 0.129629 0.132353 -0.010102 -0.103047 0.280503 -0.036091 NaN
2000-06-30 -0.248383 0.074325 -0.076026 -0.080000 0.008687 -0.060264 0.072446 -0.012356 0.076113 -0.123376 -0.025795 0.278721 0.113913 0.043327 NaN
2000-07-31 -0.170396 0.167414 -0.068718 -0.006441 -0.122958 -0.046392 -0.001403 -0.128152 0.070924 -0.287037 0.085090 -0.127343 -0.056048 -0.021162 NaN
2000-08-31 0.377593 0.098912 0.070030 0.010130 0.000981 -0.077026 0.121723 -0.106024 -0.141437 0.148702 0.032790 0.000000 0.322916 0.071246 NaN
2000-09-30 -0.073795 0.205923 0.016603 -0.018050 -0.044075 0.275952 -0.444732 0.053983 0.047195 -0.054834 -0.014390 -0.136079 0.031497 -0.046688 NaN
2000-10-31 -0.047154 0.051357 -0.036657 -0.063726 -0.122951 0.003464 0.082706 0.049095 0.098729 0.246411 0.060356 0.141969 -0.241031 -0.021938 NaN
2000-11-30 -0.325939 0.018433 -0.003044 -0.191972 0.121495 0.164557 -0.154166 -0.029557 0.037267 -0.063339 0.033636 -0.166969 -0.348252 -0.100524 NaN
2000-12-31 -0.369620 -0.042287 0.039316 0.000000 0.100101 0.022559 -0.209843 0.076168 -0.024202 0.147541 0.213949 -0.244009 -0.190975 0.016670 NaN
2001-01-31 0.112450 -0.113636 -0.013708 0.059518 -0.055555 -0.075728 0.230770 -0.001905 -0.048205 0.273143 -0.081742 0.407781 0.575589 0.037408 NaN
2001-02-28 -0.411552 0.063248 0.028578 0.016420 0.047188 0.077142 -0.228041 0.014886 -0.085690 0.085054 0.018978 -0.033777 -0.134382 -0.095482 NaN
2001-03-31 0.004172 -0.101649 0.032789 -0.075929 -0.117901 0.086283 -0.078258 0.026145 -0.148407 -0.140641 -0.073610 -0.073093 0.452797 -0.068221 NaN
2001-04-30 0.542522 0.109316 0.099772 0.057692 0.150454 -0.128534 0.174727 -0.056603 0.026496 0.034416 0.145428 0.238857 0.283078 0.081404 NaN
2001-05-31 0.057668 0.017638 -0.005282 0.045289 0.003650 0.003807 -0.126173 0.047843 0.026196 0.042345 -0.003613 0.021107 0.027732 0.008492 NaN
2001-06-30 -0.152187 -0.113585 -0.051319 -0.086338 -0.076034 0.022515 0.083611 0.095974 -0.050633 -0.051339 -0.032869 0.055219 0.083401 -0.017491 NaN
2001-07-31 -0.117314 0.052698 0.009834 -0.087920 -0.017162 -0.021877 0.019145 0.036896 -0.004988 -0.091765 -0.019457 -0.093288 -0.127763 -0.017414 0.003009
2001-08-31 -0.284227 -0.125235 -0.007003 -0.034915 0.075890 0.068258 -0.062059 0.063851 0.091255 -0.059326 -0.069539 -0.138087 0.047095 -0.061865 0.002902
2001-09-30 -0.332215 -0.343707 -0.059488 -0.267794 0.083022 0.018718 -0.268500 -0.054849 -0.037395 -0.223354 -0.049462 -0.103068 -0.351433 -0.090550 0.002200
In [36]:
# Let's change the format of the Date index to not include the day
# Only year (Y) and month (m)
# I can use a function called strftime, it changes a date to any format I like
# I want it to be year with 4 digits then month, and (-) between the year and month:

main_data.index = main_data.index.strftime('%Y-%m')

main_data.head()
Out[36]:
AMZN BA CVX DIS GOLD HSY INTC K KO M MMM MSFT NVDA ^W5000 Risk-free
Date
2000-02 0.066796 -0.169944 -0.106876 -0.063683 -0.003817 0.033823 0.142135 0.043814 -0.153428 -0.118618 -0.058077 -0.086847 0.726798 0.021192 NaN
2000-03 -0.027223 0.027196 0.248126 0.213235 -0.038314 0.115744 0.167939 0.028578 -0.034704 0.151618 0.010757 0.188813 0.320071 0.058114 NaN
2000-04 -0.176306 0.049587 -0.079108 0.057576 0.071713 -0.066666 -0.038845 -0.058253 0.010433 -0.195267 -0.021877 -0.343529 0.054929 -0.052775 NaN
2000-05 -0.124575 -0.015748 0.085903 -0.032951 0.078067 0.140110 -0.016757 0.252578 0.129629 0.132353 -0.010102 -0.103047 0.280503 -0.036091 NaN
2000-06 -0.248383 0.074325 -0.076026 -0.080000 0.008687 -0.060264 0.072446 -0.012356 0.076113 -0.123376 -0.025795 0.278721 0.113913 0.043327 NaN

Returns over time¶

Let us look at how the historical stock returns were behaving over the time of our analysis:

In [39]:
# I will use a style called "bmh" to make a plot for the monthly returns over time
with plt.style.context('bmh'):
    main_data[["HSY", "NVDA", "^W5000", "Risk-free"]].plot(title = 'Monthly Returns',
                                                # adding a label for the X axis
                                                 xlabel = "Years",
                                                 # adding a label for the Y axis
                                                 ylabel = "Returns",
                                                 # Show grids
                                                 grid=True,
                                                 # specify the size of the chart
                                                 figsize=(10,7),
                                                 # specify the width of the line
                                                 lw=1.5)



plt.show()
No description has been provided for this image

Notice some stocks have a lot of jumps; meaning that in some months they have very high returns or very low returns. On the opposite side is the Risk-free return. It is basically stable over time and has not changed a lot.


Descriptive Statistics¶

Lets have a deeper look at the monthly stock returns.

In [40]:
# Descriptive Stats
main_data.describe()
Out[40]:
AMZN BA CVX DIS GOLD HSY INTC K KO M MMM MSFT NVDA ^W5000 Risk-free
count 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 287.000000 270.000000
mean 0.021791 0.012425 0.009857 0.006993 0.008019 0.011105 0.006905 0.006957 0.006149 0.010586 0.007103 0.012066 0.037451 0.005589 0.001137
std 0.129926 0.095598 0.069003 0.076331 0.114357 0.057571 0.097305 0.049571 0.050764 0.129830 0.060703 0.081098 0.174041 0.045973 0.001328
min -0.411552 -0.454664 -0.214625 -0.267794 -0.381056 -0.177236 -0.444732 -0.138635 -0.172743 -0.628874 -0.157036 -0.343529 -0.486549 -0.177406 0.000000
25% -0.048718 -0.042805 -0.029320 -0.034121 -0.065294 -0.025150 -0.047195 -0.021066 -0.024000 -0.061762 -0.028618 -0.037006 -0.054362 -0.021149 0.000059
50% 0.021677 0.015899 0.008623 0.005748 0.007592 0.008661 0.007671 0.009467 0.008547 0.011221 0.010560 0.017244 0.030835 0.011240 0.000737
75% 0.087617 0.070993 0.046279 0.050940 0.072783 0.044363 0.062263 0.035843 0.038312 0.077279 0.039402 0.055028 0.122157 0.034368 0.001752
max 0.621776 0.459312 0.269666 0.248734 0.426362 0.275952 0.337428 0.252578 0.141928 0.644123 0.213949 0.407781 0.826231 0.133806 0.004398

We can see NVDA has the highest average monthly return at 3.75%. And the Market index ^W5000 has the lowest average monthly return at 0.56%.

Notice also that NVDA has the highest standard deviation $\sigma$, meaning the highest risk among our list of stocks.

Now with 14 securities + the T-bill, it is quite difficult to know which one would be a good stock to invest in. Because some have high returns but also are accompanied with high risk (such as NVDA), and some have a lot lower average monthly return but also a lower risk as well (such as the market index).

One approach to evaluate these stocks to a standard is to measure the Sharpe Ratio:

$$Sharpe = \frac{E(R_{i}) - R_{f}}{\sigma_{i}}$$

The Sharpe ratio basically measures what is the return you receive for each unit of risk you are taking. An investment that gives you a higher Sharpe ratio (Meaning the highest return per unit of risk) is an investment that gives you a better deal overall.

In [41]:
# Sharpe ratio for DIS
sharpe = (main_data['DIS'].mean() - main_data['Risk-free'].mean())/(main_data['DIS'].std())

print(sharpe)
0.07671412759994857
In [42]:
# Sharpe ratio for BA

sharpe = (main_data['BA'].mean() - main_data['Risk-free'].mean())/(main_data['BA'].std())

print(sharpe)
0.11807858561241205
In [43]:
# Let us go through all the stocks and print their sharpe ratios, 
# but instead of writing the code above 14 times, I will make a (loop)

# A loop means to go through each column (except the last one because it 
# is the return on T-bill "risk-free return") and apply the same set of commands:

# for each column (x) in the table main_data
for x in main_data.columns:
    # if the column is not named "Risk Free" then do the following (not equal is "!=")
    if x != "Risk-free":
        # Measure the sharpe ratio for the column x
        sharpe = (main_data[x].mean() - main_data['Risk-free'].mean())/(main_data[x].std())
        # Print the word "Sharpe Ratio for", then print the column name (x), then the variable sharpe 
        print("Sharpe Ratio for " + x + ":", sharpe)
Sharpe Ratio for AMZN: 0.1589637365765653
Sharpe Ratio for BA: 0.11807858561241205
Sharpe Ratio for CVX: 0.12637140972188973
Sharpe Ratio for DIS: 0.07671412759994857
Sharpe Ratio for GOLD: 0.06017584469299582
Sharpe Ratio for HSY: 0.17314587189384326
Sharpe Ratio for INTC: 0.0592721326989779
Sharpe Ratio for K: 0.11740215408141376
Sharpe Ratio for KO: 0.09872493337254798
Sharpe Ratio for M: 0.07278042594214636
Sharpe Ratio for MMM: 0.0982770773683789
Sharpe Ratio for MSFT: 0.13475647952193534
Sharpe Ratio for NVDA: 0.20865013582209396
Sharpe Ratio for ^W5000: 0.09684057148910538

Let me return to the descriptive statistics table. I would like first to save the table so I can manipulate it by adding columns or rows:

In [44]:
# First, save the results in a dataframe table
desc = main_data.describe()

# Second, transpose the table (i.e., flip it by making the rows column and columns rows)
# save it using the same name (basically overwriting the DataFrame)
desc = desc.transpose()

desc
Out[44]:
count mean std min 25% 50% 75% max
AMZN 287.0 0.021791 0.129926 -0.411552 -0.048718 0.021677 0.087617 0.621776
BA 287.0 0.012425 0.095598 -0.454664 -0.042805 0.015899 0.070993 0.459312
CVX 287.0 0.009857 0.069003 -0.214625 -0.029320 0.008623 0.046279 0.269666
DIS 287.0 0.006993 0.076331 -0.267794 -0.034121 0.005748 0.050940 0.248734
GOLD 287.0 0.008019 0.114357 -0.381056 -0.065294 0.007592 0.072783 0.426362
HSY 287.0 0.011105 0.057571 -0.177236 -0.025150 0.008661 0.044363 0.275952
INTC 287.0 0.006905 0.097305 -0.444732 -0.047195 0.007671 0.062263 0.337428
K 287.0 0.006957 0.049571 -0.138635 -0.021066 0.009467 0.035843 0.252578
KO 287.0 0.006149 0.050764 -0.172743 -0.024000 0.008547 0.038312 0.141928
M 287.0 0.010586 0.129830 -0.628874 -0.061762 0.011221 0.077279 0.644123
MMM 287.0 0.007103 0.060703 -0.157036 -0.028618 0.010560 0.039402 0.213949
MSFT 287.0 0.012066 0.081098 -0.343529 -0.037006 0.017244 0.055028 0.407781
NVDA 287.0 0.037451 0.174041 -0.486549 -0.054362 0.030835 0.122157 0.826231
^W5000 287.0 0.005589 0.045973 -0.177406 -0.021149 0.011240 0.034368 0.133806
Risk-free 270.0 0.001137 0.001328 0.000000 0.000059 0.000737 0.001752 0.004398
In [45]:
# I will use the same risk free rate assumption as my Excel example
risk_free = 0.004
# Add a column to this newly created table and call it "sharpe"
desc['sharpe'] = (desc["mean"] - risk_free) / desc["std"]

# I would like to drop the last row; which is indexed as "Risk Free"
desc.drop(index = 'Risk-free', inplace= True)

desc
Out[45]:
count mean std min 25% 50% 75% max sharpe
AMZN 287.0 0.021791 0.129926 -0.411552 -0.048718 0.021677 0.087617 0.621776 0.136929
BA 287.0 0.012425 0.095598 -0.454664 -0.042805 0.015899 0.070993 0.459312 0.088132
CVX 287.0 0.009857 0.069003 -0.214625 -0.029320 0.008623 0.046279 0.269666 0.084883
DIS 287.0 0.006993 0.076331 -0.267794 -0.034121 0.005748 0.050940 0.248734 0.039208
GOLD 287.0 0.008019 0.114357 -0.381056 -0.065294 0.007592 0.072783 0.426362 0.035141
HSY 287.0 0.011105 0.057571 -0.177236 -0.025150 0.008661 0.044363 0.275952 0.123418
INTC 287.0 0.006905 0.097305 -0.444732 -0.047195 0.007671 0.062263 0.337428 0.029851
K 287.0 0.006957 0.049571 -0.138635 -0.021066 0.009467 0.035843 0.252578 0.059650
KO 287.0 0.006149 0.050764 -0.172743 -0.024000 0.008547 0.038312 0.141928 0.042329
M 287.0 0.010586 0.129830 -0.628874 -0.061762 0.011221 0.077279 0.644123 0.050730
MMM 287.0 0.007103 0.060703 -0.157036 -0.028618 0.010560 0.039402 0.213949 0.051115
MSFT 287.0 0.012066 0.081098 -0.343529 -0.037006 0.017244 0.055028 0.407781 0.099455
NVDA 287.0 0.037451 0.174041 -0.486549 -0.054362 0.030835 0.122157 0.826231 0.192201
^W5000 287.0 0.005589 0.045973 -0.177406 -0.021149 0.011240 0.034368 0.133806 0.034567
In [46]:
# Example to show how useful this table:



# Finding the stock with the highest sharpe ratio:
print('The ticker with the highest Sharpe ratio: \n', desc['sharpe'].idxmax(), '\t', desc['sharpe'].max())

# Finding the stock with the highest average return:
print('The ticker with the highest expected return: \n', desc['mean'].idxmax(),'\t', desc['mean'].max())

# Finding the stock with the highest average return:
print('The ticker with the lowest expected return: \n', desc['mean'].idxmin(),'\t', desc['mean'].min())


# Finding the stock with the highest risk:
print('The ticker with the highest estimated risk: \n', desc['std'].idxmax(), '\t', desc['std'].max())

# Finding the stock with the lowest risk:
print('The ticker with the lowest estimated risk: \n', desc['std'].idxmin(), '\t', desc['std'].min())
The ticker with the highest Sharpe ratio: 
 NVDA 	 0.19220080202651832
The ticker with the highest expected return: 
 NVDA 	 0.03745089165405311
The ticker with the lowest expected return: 
 ^W5000 	 0.005589151215983193
The ticker with the highest estimated risk: 
 NVDA 	 0.17404137392432853
The ticker with the lowest estimated risk: 
 ^W5000 	 0.04597263111366421

As can be seen from the results, the highest Sharpe ratio is found in NVDA. Which means that it is a good stock to invest in.

Notice also that the security with the lowest expected return is also the same security with the lowest estimated risk (^W5000). We can conclude that generally speaking the market offers different securities to match every investor's taste and yet it is competitive and efficient. You want high risk you generally are rewarded with higher expected returns, you want lower risk you will be rewarded with lower expected returns.

QUESTION: Can I have a higher Sharpe ratio than the ones offered by these 14 tickers? Let me examine this question by creating a portfolio that invests say 30% in HSY and 70% in NVDA:


Measuring Portfolio Returns¶

The return for any portfolio consisting of $n$ stocks: $$R_{p} = \sum^{n}_{i=1} w_{i}\times R_{i}$$

where $w$ is the weight for any stock $i$.

Also remember that weights have to sum up to 1: $$\sum^{n}_{i=1} w_{i} = 1$$

In [47]:
main_data['Port_1'] = (0.3 * main_data['HSY']) + (0.7 * main_data['NVDA'])

main_data[['Port_1']].describe()
Out[47]:
Port_1
count 287.000000
mean 0.029547
std 0.122414
min -0.358341
25% -0.038771
50% 0.024024
75% 0.091934
max 0.605258

Note Port_1 has an estimated return of 2.95% and a risk of 12.24%.

I would like to see the histogram of returns for this portfolio, and compare it with its components (HSY & NVDA). The steps for creating a histogram is similar to the steps for creating a plot:

In [50]:
# We will use a theme called "classic" 
with plt.style.context('classic'):
    # Note I can customize the histogram by choosing the number of bins and the color of the edges
    hist = main_data['Port_1'].plot.hist(bins = 15, ec="black")
    hist.set_xlabel("Monthly returns")
    hist.set_ylabel("Count")
    hist.set_title("Portfolio 1 Monthly Returns Histogram")

#plt.savefig('Port_1_histogram.png', dpi = 400)
plt.show()
No description has been provided for this image
In [51]:
with plt.style.context('ggplot'):
    hist = main_data['HSY'].plot.hist(bins = 15, ec="black")
    hist.set_xlabel("Monthly returns")
    hist.set_ylabel("Count")
    hist.set_title("HSY Monthly Returns Histogram")

#plt.savefig('HSY_histogram.png', dpi = 400)
plt.show()
No description has been provided for this image
In [52]:
with plt.style.context('ggplot'):
    hist = main_data['NVDA'].plot.hist(bins = 15, ec="black")
    hist.set_xlabel("Monthly returns")
    hist.set_ylabel("Count")
    hist.set_title("NVDA Monthly Returns Histogram")

#plt.savefig('NVDA_histogram.png', dpi = 400)
plt.show()
No description has been provided for this image
In [54]:
# I will plot to histogram in one picture to compare the distribution of returns


with plt.style.context('ggplot'):
    # first histogram is the NVDA stock, I choose the color blue for it
    hist = main_data['NVDA'].plot.hist(color = 'blue', bins = 20, ec="black")
    # first histogram is the Port_1, I choose the color red for it. 


    # Notice I also made this histogram's transparency set to 70%. That way I can see any information behind the bars 
    hist = main_data['Port_1'].plot.hist(color = 'red', bins = 20, ec="black", alpha = 0.7)


    hist.set_xlabel("Monthly returns")
    hist.set_ylabel("Count")
    hist.set_title("Portfolio 1 vs. NVDA Monthly Returns Histogram")

    # I asked to show the definition for each color 
    hist.legend()

#plt.savefig('Port_1_histogram.png', dpi = 400)
plt.show()
No description has been provided for this image

From the graph above, I can conclude that the distribution of returns for Port_1 is closer to the mean (centered in the middle) when compared to the returns of NVDA. This means a lower risk!

In [55]:
# Let us examine its Sharpe ratio
sharpe = (main_data['Port_1'].mean() - risk_free)/(main_data['Port_1'].std())
print(sharpe)
0.20869598932150724

As can be seen from this result, I can have a higher Sharpe ratio than HSY or NVDA alone if I made a portfolio investing 30% in HSY and 70% in NVDA; 0.21 vs. 0.19 or 0.12.

Because of diversification, I can create an investment with lower risk (meaning lower $\sigma$), making the denominator in the Sharpe ratio formula smaller, which inturn makes the Sharpe ratio larger.

But why should I stop with two stocks? Maybe I should try to include more stocks in the portfolio, and also try different weights instead of 30% and 70%.

However, with my current setting, this would require me to make a column each time I change weights or add additional stocks. It basically becomes a guessing game...

Is there a way to ask the computer to try different weights on the 14 tickers until it finds a portfolio that has the highest Sharpe ratio? Yes there is a way. But before I jump into it, I need to review how portfolio risk is actually estimated, and also *examine how stock returns behave when they are compared with each other.*


Calculating Portfolio Risk (Markowitz Method)¶

The risk of any portfolio can be measured as the standard deviation of the portfolio return $\sigma_{p}$. However, if the portfolio consists of 2 securities, the standard deviation of the portfolio will not be simply the weighted average of the standard deviation for the two securities.

We need to consider the covariance/correlation between the securities. The covariance reflects the co-movement of returns between the two stocks. Unless the two assets are perfectly positively correlated, the covariance will reduce the overall risk of the portfolio. $$ \rho_{i,j} \,\, or \,\, Corr(R_{i}, R_{j}) = \frac{Cov(R_{i}, R_{j}) \, or\, \sigma_{i,j}}{\sigma_{i}\sigma_{j}} $$

The Variance for any portfolio consisting of $N$ stocks: $$ \sigma^{2}_{p} = \sum^{N}_{j=1} w_{j} \sum^{N}_{i=1} w_{i}\sigma_{ij} $$

Then the portfolio risk* becomes*: $$ \sigma_{p} = \sqrt{\sigma^{2}_{p}} $$

So for our example of Port_1, we have 2 stocks; stock 1 (HSY), and stock 2 (NVDA). Applying the formula:

$$\sigma^{2}_{Port(1)} = w_{1}w_{1}\sigma_{1,1} + w_{1}w_{2}\sigma_{1,2} + w_{2}w_{2}\sigma_{2,2} + w_{2}w_{1}\sigma_{2,1}$$$$\sigma^{2}_{Port(1)} = [0.3 \times 0.3 \times \sigma_{HSY,HSY}] + [0.3 \times 0.7 \times \sigma_{HSY,NVDA}] + [0.7 \times 0.7 \times \sigma_{NVDA,NVDA}] + [0.7 \times 0.3 \times \sigma_{NVDA,HSY}] $$

Since:

  • The covariance of stock 1 with itself is the variance ($\sigma_{1,1} = \sigma^{2}_{1}$)

  • According to the correlation equation above: $\sigma_{2,1} = \sigma_{1,2} = \rho_{1,2} \sigma_{1} \sigma_{2}$

The equation becomes: $$ \sigma^{2}_{Port(1)} = w^{2}_{1}\sigma^{2}_{1} + w^{2}_{2}\sigma^{2}_{2} + 2 \times (w_{1}w_{2}\sigma_{1}\sigma_{2} \rho_{1,2}) $$

And Port_1's risk is:

$$ \sigma_{Port(1)} = \sqrt{\sigma^{2}_{Port(1)}} $$

Using Matrices¶

Now for portfolio 1 the above calculations can be relativity easy to calculate. But when I have 3 stocks, or 10 stocks, measuring a portfolio risk can be a very long and tiring equation. Fortunately, I can use matrices in python to make these calculations fast, easy, and consistent.

If I "translate" the above portfolio variance equation into matrices. It will take the following form:

$$ \sigma^{2}_{p} = \begin{bmatrix} w_{1} & w_{2} & \dots & w_{n}\end{bmatrix} \begin{bmatrix} \sigma^{2}_{1} & \sigma_{1,2} & \dots & \sigma_{1,n} \\ \sigma_{2,1} & \sigma^{2}_{2} & \dots & \sigma_{2,n} \\ \dots & &\ddots & \vdots\\ \sigma_{n,1} & \sigma_{n,2} & \dots & \sigma^{2}_{n}\end{bmatrix} \begin{bmatrix} w_{1} \\ w_{2} \\ \vdots \\ w_{n}\end{bmatrix} $$

Now let me use matrix here to measure the standard deviation of a new portfolio (call it "Portfolio 2") where I invest 20% in Nvidia (NVDA), 30% in Microsoft (MSFT), and 50% in CocaCola (KO):

In [56]:
# I will create a list of portfolio weights
w_list =[0.2, 0.3, 0.5]

# The package (Numpy) will be very helpful here 
# (notice I gave it a nickname "np" at the beginning of this notebook)
# It can deal with matrix multiplication and create random numbers if I want
# put the list of weights in a matrix
w = np.array([w_list])

print(w)
[[0.2 0.3 0.5]]
In [57]:
# Make sure the weights always sum up to 1.
# Here I write a statement that the sum of all the elements in a matrix is equal to 1. 
# The computer will tell me if that statement is true or false
np.sum(w) == 1
Out[57]:
True

I need a 3 by 3 matrix of covariance for the stocks I want to make a portfolio from ('NVDA','MSFT','KO'). To do that, I first take a slice of the (main_data) which includes only the stocks I care about now. Then save it using a new name (slice)

In [60]:
slice = main_data[['NVDA', 'MSFT', 'KO']]
# Lets see this DataFrame
slice.head()
Out[60]:
NVDA MSFT KO
Date
2000-02 0.726798 -0.086847 -0.153428
2000-03 0.320071 0.188813 -0.034704
2000-04 0.054929 -0.343529 0.010433
2000-05 0.280503 -0.103047 0.129629
2000-06 0.113913 0.278721 0.076113
In [61]:
# The covariance table for the dataset (slice)
covariance = slice.cov()

print(covariance)
          NVDA      MSFT        KO
NVDA  0.030290  0.005527  0.000231
MSFT  0.005527  0.006577  0.001013
KO    0.000231  0.001013  0.002577
In [62]:
# Take the covariance table and put it in a matrix
# I will reuse the same variable name (it will overwrite)
covariance_matrix = np.array(covariance)

covariance_matrix
Out[62]:
array([[0.0302904 , 0.00552749, 0.00023095],
       [0.00552749, 0.00657686, 0.00101322],
       [0.00023095, 0.00101322, 0.00257697]])
In [63]:
# Notice I can do all the three steps in one line like this

covariance_matrix = np.array(main_data[['NVDA', 'MSFT', 'KO']].cov())

covariance_matrix
Out[63]:
array([[0.0302904 , 0.00552749, 0.00023095],
       [0.00552749, 0.00657686, 0.00101322],
       [0.00023095, 0.00101322, 0.00257697]])
In [65]:
# Now I have matrix 1 and 2, whats left is matrix 3
# matrix 3 is basically the "transpose" of matrix 1
# Let me create it
w_vertical= w.T

print(w_vertical)
[[0.2]
 [0.3]
 [0.5]]

Now I apply the above matrix formula. It will look like this:

$$\sigma^{2}_{p} = \begin{bmatrix} w_{NVDA} & w_{MSFT} & w_{KO}\end{bmatrix} \begin{bmatrix} \sigma^{2}_{NVDA} & \sigma_{NVDA,MSFT} & \sigma_{NVDA,KO} \\ \sigma_{MSFT,NVDA} & \sigma^{2}_{MSFT} & \sigma_{MSFT,KO} \\ \sigma_{KO,NVDA} & \sigma_{KO,MSFT} & \sigma^{2}_{KO}\end{bmatrix} \begin{bmatrix} w_{NVDA} \\ w_{MSFT} \\ w_{KO}\end{bmatrix} $$
In [66]:
# The way matrix multiplication is done in Python is through the use of the function (.dot) in Numpy

# First matrix times the 2nd matrix
first = w.dot(covariance_matrix) 

# The answer is then multiplied by the 3rd matrix
second = first.dot(w_vertical)


# Alternatively, embed the first onto the second
# The code is more efficient that way ;)

port_cov = w.dot(covariance_matrix).dot(w_vertical)


print(port_cov)
[[0.00346123]]
In [67]:
# Take out the answer from the matrix:
port_cov[0,0]
Out[67]:
0.0034612305695005154
In [68]:
# Thus, the risk for this portfolio is the square root of the calculated variance:

port2_risk = port_cov[0,0] **0.5

port2_risk
Out[68]:
0.05883222390408606

Thus, the risk of Port_2 is about 5.87% and it's return (using the data_main[column name].mean() method) is:

In [69]:
port2_return = (main_data['NVDA'].mean()* 0.2)+(main_data['MSFT'].mean()*0.3)+(main_data['KO'].mean()* 0.5)

port2_return
Out[69]:
0.014184255809107784

Measuring the Sharpe Ratio:

In [70]:
port2_sharpe = (port2_return - risk_free)/port2_risk

port2_sharpe
Out[70]:
0.17310676247274173

So portfolio 2 is not an improvement over portfolio 1...


Exporting Datasets¶

The pandas package offers its users a way to export their tables and datasets to any type of file. It also offers options to import files into a new dataset. In the following code I will export the dataset I have created earlier (main_data) to an Excel file. The produced Excel file will be located in the same folder where this Jupyter Notebook exists.

In [52]:
main_data.to_excel("returns_cleaned.xlsx")