Portfolio Mean-Variance Optimization¶
In this document, you will learn how to find the optimal portfolio using the Markowitz Model (Mean-Variance Optimization Method) for a fixed list of securities. This notebook is a sequel to previous notes where I am using a dataset of roughly 14 securities. Thus I suggest you read the previous note to fully understand what is going on here.
As always, I start this notebook with the packages I will need for my analysis:
Installation:¶
If you do not have the package pypfopt
installed on your computer, you need to run this code (Only once on your device):
import sys
!{sys.executable} -m pip install PyPortfolioOpt
Now I will import the libraries I need for my session:
from pypfopt import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns
# This is the package for downloading stock data
import yfinance as yf, pandas as pd, numpy as np, datetime as dt, matplotlib as mpl, matplotlib.pyplot as plt
# This package provides a tool for optimizing any objective
# It is similar to the Excel tool (Solver)
# I will use it as an alternative method to the package (pypfopt)
import scipy.optimize as sco
Note I am adding new packages:
scipy
: This package has a series of functions. One of them (optimize
) allows the user to instruct Python to optimize any particular function; meaning to try different scenarios or changing my inputs until to reach either the maximum value or the minimum value possible. It is similar to the tool (Solver) in Excel.pypfopt
: This package called PyPortfolioOpt is very helpful for any quantitative financial analyst, it allows the user to solve most of the problems we try to solve in finance in a faster and more efficient way. The only issue (from my experience) is that it is not customizable to the user's particular need. I will explain that later when I start using the package at the end of this note. You can read more about it on their official page: [https://pyportfolioopt.readthedocs.io/en/latest/]
In the following steps, I will upload the excel dataset I saved in the previous notebook, which includes the monthly returns for all the tickers + the monthly risk-free rate. Then I will drop some columns and rows that I do not need from this table.
main_data = pd.read_excel('returns_cleaned.xlsx', index_col='Date')
main_data.head()
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | ^W5000 | Risk Free | Port_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||||||
2000-02 | 0.066796 | -0.169944 | -0.106876 | -0.063683 | -0.003817 | 0.033823 | 0.142136 | 0.043815 | -0.153428 | -0.118619 | -0.058078 | -0.086845 | 0.726813 | 0.021192 | NaN | 0.518916 |
2000-03 | -0.027223 | 0.027196 | 0.248125 | 0.213236 | -0.038314 | 0.115744 | 0.167938 | 0.028577 | -0.034704 | 0.151618 | 0.010757 | 0.188811 | 0.320068 | 0.058114 | NaN | 0.258771 |
2000-04 | -0.176306 | 0.049587 | -0.079108 | 0.057576 | 0.071714 | -0.066667 | -0.038844 | -0.058252 | 0.010433 | -0.195266 | -0.021877 | -0.343529 | 0.054929 | -0.052775 | NaN | 0.018450 |
2000-05 | -0.124575 | -0.015748 | 0.085904 | -0.032951 | 0.078067 | 0.140109 | -0.016757 | 0.252577 | 0.129630 | 0.132353 | -0.010101 | -0.103047 | 0.280505 | -0.036091 | NaN | 0.238386 |
2000-06 | -0.248383 | 0.074325 | -0.076026 | -0.080000 | 0.008687 | -0.060264 | 0.072445 | -0.012356 | 0.076112 | -0.123377 | -0.025796 | 0.278721 | 0.113911 | 0.043327 | NaN | 0.061658 |
# I will drop the the columns (Port_1 & Risk Fee) since I do not need them in my task here
# I am using "Inplace" so I replace the existing table "main_data" with this new one
main_data.drop(columns=['Port_1', 'Risk Free'], inplace = True)
main_data.head()
AMZN | BA | CVX | DIS | GOLD | HSY | INTC | K | KO | M | MMM | MSFT | NVDA | ^W5000 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||||
2000-02 | 0.066796 | -0.169944 | -0.106876 | -0.063683 | -0.003817 | 0.033823 | 0.142136 | 0.043815 | -0.153428 | -0.118619 | -0.058078 | -0.086845 | 0.726813 | 0.021192 |
2000-03 | -0.027223 | 0.027196 | 0.248125 | 0.213236 | -0.038314 | 0.115744 | 0.167938 | 0.028577 | -0.034704 | 0.151618 | 0.010757 | 0.188811 | 0.320068 | 0.058114 |
2000-04 | -0.176306 | 0.049587 | -0.079108 | 0.057576 | 0.071714 | -0.066667 | -0.038844 | -0.058252 | 0.010433 | -0.195266 | -0.021877 | -0.343529 | 0.054929 | -0.052775 |
2000-05 | -0.124575 | -0.015748 | 0.085904 | -0.032951 | 0.078067 | 0.140109 | -0.016757 | 0.252577 | 0.129630 | 0.132353 | -0.010101 | -0.103047 | 0.280505 | -0.036091 |
2000-06 | -0.248383 | 0.074325 | -0.076026 | -0.080000 | 0.008687 | -0.060264 | 0.072445 | -0.012356 | 0.076112 | -0.123377 | -0.025796 | 0.278721 | 0.113911 | 0.043327 |
*Remember, my objective is to find the best portfolio I can create with these tickers. Thus, my decision variable is to choose the set of weights that gives me the best portfolio.*
The package scipy
has a function called optimize
(recall that I gave it a nickname "sco"). This function will be very helpful in solving this problem for me. optimize
takes several arguments; for example what is the variable you would like to optimize? If there are any constraints (conditions), what are they? What method to use for finding the optimal solution? And so on...
Setting up the optimization problem¶
The objective here is to set up the following mathematical optimization problem in Python:
$$\max{\left[Sharpe = \frac{E(R_{p}) - R_{f}}{\sigma_{p}}\right]}$$s.t. $$\sum^{N}_{i=1} w_{i} = 1$$ $$0 \leq w_{i} \leq 1$$ $$\sigma_{p} = \sqrt{\sum^{N}_{j=1} w_{j} \sum^{N}_{i=1} w_{i}\sigma_{ij}}$$
In our example, I would like to find the optimal weights that will yield the highest Sharpe ratio possible for the stocks we have information about. Subject to several constraints:
- The sum of these weights needs to equal to 1
- The weights for each ticker should be between 0 and 1
- The risk of the portfolio will be calculated using the suggested weights
So let me show you how it can be set in scipy
:
First, define some functions:¶
Functions are helpful tools to use in Python. Think of them as a series of Python instructions and calculations that are done in one line, and can be done to any variable. In essence, the user provides the inputs and the function returns the output.
The following functions will be useful for our task. The first one measures the portfolio return, the other one measures the portfolio risk, and the third one measures the negative of the Sharpe ratio for a given portfolio:
# Define a function that measures a portfolio return
def port_return(weights, asset_exp_returns):
'''This function takes two arguments:
1) A Numpy vector of weights (called it 'weights' )
2) A list of the stock expected returns (called it 'asset_exp_returns')
The function will multiply each weight with the
expected return to find the portfolio expected return.
(Similar to SUMPRODUCT in Excel)'''
# Here is the calculation
the_return = np.sum(asset_exp_returns * weights)
# I then ask the function to return this variable back when it is called
return the_return
# Making sure it works by setting the weight to 100% in BA (the second stock in my list):
w_list = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
test = port_return(w_list, main_data.mean())
print(test)
0.012425234983117038
# Define a function that measures a portfolio's risk
def port_risk(weights, cov):
'''This function takes two arguments:
1) A Numpy vector of weights (called it 'weights' )
2) A covariance table (called it 'cov' )
This function will do a matrix multiplication to measure the
portfolio volatility. Then takes the square root of the
calculated volatility and return to the user the portfolio risk.'''
volatility = weights.dot(cov).dot(weights.T)
std = np.sqrt(volatility)
return std
# Making sure it works by setting the weight to 100% in BA:
w_list = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
test = port_risk(w_list, main_data.cov())
print(test)
0.09559818130914506
# Define a function that measures the negative value of the Sharpe ratio
def neg_s_ratio(weights, exp_returns, cov, r_f):
'''This function takes 4 arguments:
1) A Numpy vector of weights
2) A list of the stock expected returns
3) A covariance table
4) The risk free rate (Represented as a fraction)
This function will measure the negative value of the
Sharpe ratio for any portfolio.'''
port_return = np.sum(exp_returns * weights)
port_risk = np.sqrt(weights.dot(cov).dot(weights.T))
port_sharpe = (port_return - r_f)/port_risk
return -1 * port_sharpe
# Making sure it works by setting the weight to 100% in BA:
w_list = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
neg_s_ratio(w_list, main_data.mean(), main_data.cov(), r_f= 0.005)
-0.07767129961505584
# Another example:
w_list = np.array([0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, \
0.05, 0.05, 0.1, 0.1, 0.1, 0.2])
neg_s_ratio(w_list, main_data.mean(), main_data.cov(), r_f= 0.00107 )
-0.21083368940973155
Running the optimization problem:¶
Unfortunately, the scipy
package does not offer a maximization function, only a *minimization function. Thus, if we would like to maximize the Sharpe ratio, we could express our problem as minimizing the negative value of the Sharpe ratio*.
# I will create some variables so I can reference them later
# list of expected returns for each ticker
exp_returns = main_data.mean()
# the covariance table for my dataset
cov = main_data.cov()
# number of stocks I have
n_stocks = len(exp_returns)
# the risk-free rate next month
r_f = 0.005
# I am defining the arguments I will enter in the optimization problem
# They are constants (will not change) and specific for my setup:
arguments = (exp_returns, cov, r_f)
# I have one constraint (Or condition): the sum of weights should equal 1
# This is expressed as sum(w) - 1 = 0:
# the function requires me to put the constraints in a dictionary variable:
constraints = ({'type': 'eq', 'fun': lambda w: np.sum(w) - 1})
# The bounds for my variable (weight) should be between 0 and 1
# the function requires me to put the bounds in a tuple variable:
bounds = tuple( (0,1) for stock in range(n_stocks) )
# I will set the weights to initially be equal
initial_guess = n_stocks * [1 / n_stocks]
# Run the minimize function and supply it with our specifications:
optimal_sharpe = sco.minimize(neg_s_ratio,
x0 = initial_guess,
args = arguments,
method = 'SLSQP',
bounds = bounds,
constraints = constraints)
# Let us see our results:
# Save the optimal weights 'x' in a variable called "optimal_sharpe_w"
optimal_sharpe_w = optimal_sharpe['x']
# Create a dictionary that contains the information of our portfolio
optimal_port = {'Return': port_return(optimal_sharpe_w, exp_returns),
'Risk': port_risk(optimal_sharpe_w, cov),
'Sharpe Ratio': -1 * optimal_sharpe['fun']}
# Print the results
print('Portfolio with Highest Sharpe Ratio ---')
print('Performance')
print ("-"*100)
for index, value in optimal_port.items():
print(f'{index}: {100 * value:.2f}%', end = "\n", flush=True)
print('\n')
print('Weights:')
print ("-"*100)
for x,y in zip(main_data.columns, optimal_sharpe_w):
print(f'{x}: \t {100 * y:.2f}%', end = " \n", flush=True)
Portfolio with Highest Sharpe Ratio --- Performance ---------------------------------------------------------------------------------------------------- Return: 1.87% Risk: 5.95% Sharpe Ratio: 23.00% Weights: ---------------------------------------------------------------------------------------------------- AMZN: 14.55% BA: 3.35% CVX: 0.19% DIS: 0.00% GOLD: 3.58% HSY: 53.51% INTC: 0.00% K: 0.00% KO: 0.00% M: 0.00% Risk: 5.95% Sharpe Ratio: 23.00% Weights: ---------------------------------------------------------------------------------------------------- AMZN: 14.55% BA: 3.35% CVX: 0.19% DIS: 0.00% GOLD: 3.58% HSY: 53.51% INTC: 0.00% K: 0.00% KO: 0.00% M: 0.00% MMM: 0.00% MSFT: 1.73% NVDA: 23.10% ^W5000: 0.00%
Using an alternative Python package: (PyPortfolioOpt)¶
The package pypfopt
can find the weights for the optimal portfolio. To use this package, you need to give it the adjusted closing price, not returns. In addition, the results are annualized. That is, it will find the optimal portfolio with the highest annual return and lowest annual risk.
Since I do not have the stock prices anymore, I will download them again using yfinance
ticker_list = ["^W5000", "BA", "AMZN", "CVX", "DIS", "GOLD", "HSY",\
"INTC", "K", "KO", "M", "MMM", "MSFT", "NVDA"]
# Set the start and end date
start_date = dt.date(1999,12,31)
end_date = dt.date(2023,12,31)
# Instead of using a ticker, now we will pass the name of the list
stock_data = yf.download(ticker_list, start=start_date, end=end_date, interval='1mo')
# We only need the adj. close prices, so I "slice" the table and replace the original table name
stock_data = stock_data['Adj Close']
[*********************100%%**********************] 14 of 14 completed
Running the optimization problem:
The package has some "extensions" that can calculate expected returns and covariances. The only thing you need to do is supply the table that has the prices.
# I will use the extension from the package to measure the expected return for each ticker
# and save in a variable called "mu"
mu = expected_returns.mean_historical_return(stock_data, frequency=12)
# The same story here; use the extension from the package to measure the covariance table
S = risk_models.sample_cov(stock_data)
# the variable that has all the specifications
# of the optimization problem will be named 'ef'
ef = EfficientFrontier(mu, S)
# Save the optimal weights
raw_weights = ef.max_sharpe(risk_free_rate=0.005)
# clean it
cleaned_weights = ef.clean_weights()
# saves to file:
#ef.save_weights_to_file("weights.csv")
print(cleaned_weights)
ef.portfolio_performance(risk_free_rate=0.005, verbose=True)
OrderedDict([('AMZN', 0.07954), ('BA', 0.0), ('CVX', 0.09198), ('DIS', 0.0), ('GOLD', 0.0), ('HSY', 0.52918), ('INTC', 0.0), ('K', 0.05322), ('KO', 0.03218), ('M', 0.0), ('MMM', 0.0), ('MSFT', 0.10896), ('NVDA', 0.10493), ('^W5000', 0.0)]) Expected annual return: 13.7% Annual volatility: 70.5% Sharpe Ratio: 0.19
(0.1369643274514806, 0.7048916314044404, 0.1872122203927321)