top of page
Search
  • Writer's pictureRenexe Project

Backtesting for the period 2007-2008, Sharpe ratio = 2.48. Long/short equity portfolio optimization.

Updated: Dec 11, 2019

We test the performance of the software for the period of financial crisis 2007-2008 in the basic setup. We collected open source data on stock returns in the period 2002-2009. The data from the beginning 2002 to April 2007 are used as training data using which we specify the arguments of the model. The data from April 2007 onwards is used for the backtesting. We limit the market cap size of companies from the shortlist to be in the range of USD 300mn – USD 3bn and select the companies with a PEG ratio no more than 2. As a result, there are 423 US, EU, Asian companies in the list with available stock price history from the year 2002, in total over 2000 daily return observations.


stock log returns, outliers, z-score,
Figure 1. Outlier analysis with z-score

At the first stage, we do the data processing. Trading days may differ on European, US and Asian exchanges to days due to different holidays. Missing returns for those days are replaced with NaNs. We calculate log returns, align the dates, remove outliers and do the Gaussian Kernel density estimation (KDE). Gaussian KDE processes the empirical return distribution to obtain a smoothed probability density function (PDF). The KDE bandwidth determines how close to the original empirical distribution will be fitted KDE (Figure 1). With smaller bandwidth, the fitted PDF will reflect the shape of the empirical distribution more accurately: tails, kurtosis, and skewness.


Kernel density estimator with a default bandwidth for smoothing stock return probability density function
Figure 2. Kernel density estimate for stock returns

The processing and KDE smoothing is implemented for all 400+ stocks in the shortlist. As a next step, we estimate the dispersion matrix for the Monte Carlo simulation model. In our basic setup, we assume that the market on average will be moving as it was in the previous five years 2002-2005. Hereby, we do not forecast or assume that the stock market will actually crash in the years 2007-2008. No macroeconomic scenario modeling is included in this backtesting. The cumulative stock returns are generated in a Markov Chain model without including any short-term effects of autocorrelation or cyclical volatility. Hereby, the Monte Carlo simulation is conducted for 90 days and with 10000 scenario realizations and three-dimensional data array is obtained 90(days)x424(stocks)x10000(scenarios). The scenarios with cumulative returns of stocks in day 90 are subsequently used in the optimization models.


Fat-tailed t copula. Correlated stock returns.
Figure 3. Correlated stock returns for three stocks

3D scatter plot demonstrates the output of the simulation model. In Figure 3, represented cumulative 90-days return scenarios of three strongly correlated stocks. We can clearly see how closely are related the stock returns. Low returns are concentrated for all three stocks in the lower closer right corner of the plot and higher returns in the top farther left corner of the plot. Our solutions allow for modeling various tail dependence degrees which is important for shock scenarios such as a market crash. In such a situation, the correlation values for most of the stocks will increase and many stocks will be falling together with SP&500 index similar to what is shown in the lower tail of the plot. Figure 4 demonstrates the contrasting difference on how would look uncorrelated returns for the three stocks. The rectangular region would be filled uniformly with stock return coordinates.


Uncorrelated stock returns
Figure 4. Uncorrelated stock returns

Two grave mistakes are made in many of the existing portfolio optimization models/software. The normal probability distribution is fitted to empirical data. Tails are much “thinner” in this case when superimposing a normal distribution on log-returns. In fact, the absolute and relative risk for most stocks is much higher as demonstrated in Figure 5 below. Tail dependence is modeled incorrectly and hedging in the equity portfolio is effectively not achieved for high risk scenarios. The models may be functioning well in the normal macroeconomic environment but failing badly during market crashes and recessions.

Stock log returns have fat-tails and much higher CVaR in comparison to Normal probability density function fitted on top of empirical distribution. Silverman bandwidth KDE smoothing.
Figure 5. Fat-tailed return distribution. Tail risk comparison KDE vs Normal PDF for the same stock..

We choose randomly one stock from the data and estimate three probability distributions for it. The first two are built with Kernel Density Estimator with different bandwidths. The third one is fitted with a normal distribution (PDF). We can immediately observe that the tail risk of the fitted normal PDF (dark filled area) is about two times lower than that of a KDE PDF for a chosen value-at-risk (VaR). The actual risk for this stock is much higher as represented by the area filled with both yellow and dark colors. Our proposed models fully take into account these shortcomings carefully constructing return distributions and tail dependencies when implementing Monte Carlo simulation.


The output of the Monte Carlo simulation model serves as input in the tail risk minimization problem. The tail risk represents an expected tail loss of a resulting portfolio return distribution at a chosen confidence level. The model minimizes tail risk constraining multiple times step-wise the return of the portfolio. This way we achieve an efficient frontier for feasible risk-return combinations.


We use the long/short models so that the model can take long or short positions in any of the stocks. Exposure is constrained at zero, which means that the sum of long positions is equal to the sum of short positions. Leverage is taken as 3: 1.5 investment units to “sell” and 1.5 investment units to “buy”. The position size in one stock cannot be over 0.1 investment units so that the minimum number of stocks in the portfolio will be 30.


The output of the model is provided by the efficient frontier and the corresponding solutions. The solutions contain investments in between 30 and 75 stocks depending on the risk level. We choose two solutions from the portfolio Solution 1 and Solution 2 highlighted with the red color on the frontier (Figure 8). These solutions represent a trade-off between the increasing risk and return. For Solution 1 we can observe a 90-day return of 4.3% with corresponding expected shortfall at confidence interval 95% - 5.3% and investments in 54 stocks. For Solution 2 – Return 6.9%, expected shortfall 5.9%, and 44 stocks.


Efficient frontier with CVaR minimization. Implemented in Python, GUROBI, CVXPY, PULP, CBC solver.
Figure 8. Efficient frontier risk-return. Modern portfolio optimization in Python.

In our setup, the investment is made only once in April 2007 and held (without rebalancing) until the end of 2008. We can see in Figure 9 that the portfolio would yield an impressive return of 72.3% while the market crashed below -30%. The optimization effect comes into force very strongly in the first months and slowly fades away in the coming months. These results are achieved without future knowledge, stocks pre-listing or pessimistic macro-scenario modeling with moderate leverage 3 (1.5 units to buy and 1.5 units to sell).


Backtesting porfolio performance. Long/short equity portfolio CVaR minimization. Exposure 0. Leverage 3. Broad diversification.
Figure 9. Financial portfolio backtesting 2007-2008. 72.3% return achieved.

This report shortly outlines the procedure used in the models. The report confirms the significant potential for applications of modern portfolio optimization and Monte Carlo simulation models for stock markets. All underlying procedures are implemented in open source Python modules.


In the next report, we will publish backtesting results for the year 2018, and in particular for the market crash of late 2018. We will introduce (negative) macroeconomic scenario modeling, treasury yield spread analysis, and graphic demonstration of optimization efficiency.

156 views0 comments
bottom of page