If not mean-variance finance, then what?

I want to apologize to my small audience for being so long in waiting to post a blog post.  My goal had been one per week, but life intervened in the meantime.  While I have tried to be productive, the blog post fell by the wayside. 


I did write a couple of short stories, I must confess.  I wrote How to Engage in Counterespionage Operations Against Ghosts because I grew up watching Sherlock Holmes movies and the Twilight Zone.  I wrote a small set of short stories about the first day of the second American civil war.  Over the last decade, people have carelessly bandied about war as a solution to America’s structural problems.  I wanted them to really get what that would mean.  I also wrote a short story called The Exfiltration of a President.  After all, if one of the most watched and guarded persons on Earth had to get to a jurisdiction without extradition, how would that be accomplished?


Today’s post is about none of those, although national stability should surely be included in the risk premiums of equity securities shouldn’t it?  I know I promised a sex edition, but it will have to wait to til the next blog post, I am going to provide something quite a bit more valuable.  I have a paper in peer review for publication and a second to follow it upon acceptance.  I also have to others waiting in the wings, but the implications for this are that the public will see everything a year or so from now.


The post here is an attempt to thread the needle between respecting copyright and the peer review process by not duplicating content but also to permit people to build on the content while mine is standing in line.  It also solves a validation problem for data science.


Most data scientists are not aware of the controversies in finance or economics.  There is no reason for them to be.  They need the tools because data scientists are builders, they do not need to be mired down in the Allais paradox, the equity premium paradox, or the many other issues economists are paid to deal with.

I say this because of the types of questions data scientist and others in the area of finance ask on Stack Exchange.  It is obvious what is in the pedagogy of data scientists and what is not.

In 1963, Benoit Mandelbrot published a paper called On the Variation of Certain Speculative Prices.  The synopsis of the article would be, “if that is your theory, then this cannot be your data, and this is your data.”  By 1973, Fama and MacBeth should have put mean-variance finance to bed as a construction of the world with a falsification sufficient to close out the field.


The difficulty is that there was nothing to replace it with.  Economics has been in the place that physics was, following the Michelson and Morley experiments.  It knew classical physics had a severe problem somewhere, but it would have to wait for a generation until quantum mechanics, and relativity came around for it to work again.

The mistake, however, in using models such as the Capital Asset Pricing Model, Black-Scholes or Fama-French is that we know they do not work.  The error in thinking is that that would imply that nothing works and we don't know anything at all about what does work.  The structure of the rest of this post is to show why mean-variance finance cannot work followed by showing a tool that works consistently, even though there isn’t a lot of theory at this point as to why.

Why worry about a controversy in economics if someone is paying you to build something that does not work?  Because if you can create something that can work, you have a customer for life.  Although I have an idea of how I would move this process forward as I have performed and have taught securities analysis for over two decades, given the size of the professional population, many people will have better ideas than mine that may never have been considered if the broader group of professionals were not focused on what works empirically.

The goal of this post is to get people thinking, talking and building. 

The proof regarding the excludability of mean-variance methods and the argument for their general exclusion was first put forth in their protoform by Poisson and Augustin Cauchy in articles in the first half of the nineteenth century.  A similar argument was made by R.A. Fisher in the 1930s as an example of how the statistical methods of Pearson and Neyman could go wrong.  

Because of how much work is required to include mathematical symbols in a blog post, I will reference the work of others at times rather than re-derive their well-known work.

I am conscious that formalisms are weird in blog posts, but it permits serious rebuttal, and it permits a sober analysis by data scientists working in finance.  It also isn’t being submitted as a paper for two reasons.  First, the proof has been well known in statistics since the 1940s, second, I have a similar paper already out there.


For the lemmas and the theorem, there will be no dividends, mergers and no firm can go bankrupt.  That is in line with the assumptions of the standard models and because the other items are not necessary to exclude mean-variance finance.  Dividends, bankruptcy, liquidity costs, and merger risks will be essential for the portion discussing how to move forward.


Assumptions and Definitions


  1. There are very many potential buyers and sellers.
  2. The market is in equilibrium.
  3. The securities are equity securities.  This excludes various forms of bonds and other assets such as antiques which have different distributions.
  4. The securities are exchanged in a double auction.
  5. Buyers purchase and sellers sell q securities at price where .
  6. All securities are purchased at time t and sold at time t+1.
  7. It is known with certainty that none of these firms will go bankrupt or merge out of existence.
  8. The parameters are estimated from information.
  9. Errors at time t and are t+1 independent.


  1. The reward for investing resources at time t is defined as    The return is defined as 
  2. A statistic is any function of the data.
  3. Equilibrium price is defined as 
  4. Equilibrium reward is defined as 
  5. The reward for investing is also defined as  where  is the equilibrium reward and is a random variable..

Lemma The distribution of the reward for investing, or the return, is approximately the distribution of the errors for securities near their equilibrium prices.

By Wold's Decomposition theorem and given the assumption of an equilibrium price, prices can be written as 

Since assumption 7 requires that  it follows that definition 1 can be reduced to the ratio of prices.

Definitions 1 and 5 provide two definitions of the reward which could be written as  which leads to  which for small errors around the equilibrium set of prices is 


Lemma If the price errors from the first lemma are normally distributed around zero then as prices go to the equilibrium the distribution of the errors to reward is the same as the distribution of the ratio of errors about the prices.

However, if  is normally distributed around zero, then after several normalizations, it is known from Marsaglia, that the distribution of  where a and b are constants and x and y are normal random variables is proportionate to the standard Cauchy multiplied by a function  which will go to one as the price goes to equilibrium, leaving only the Cauchy distribution portion.  Note that this does not hold as prices go far from the equilibrium as would be the case in a bubble or market collapse. Nonetheless 

if the errors are normally distributed.


Theorem  Given the assumptions and the first two lemmas, the distribution of returns of the reward function is the truncated Cauchy distribution.

Assumption 3 has an important consequence conditional on definition 1.  There is no requirement in definition 1 that either the numerator or the denominator have a stochastic component.  Had assumption 7 not excluded mergers and bankruptcy and assumption 3 not required an asset to have the properties of an equity security, then a wide variety of possible stochastic processes could be built into returns.

If an asset had been a zero coupon bond, then the numerator would be known with certainty, and the distribution would reflect on the error of pricing at purchase, excluding bankruptcy risk.  Likewise, if it were certain that a cash-for-stock merger would happen, then the numerator would also have been cash and certain.  In addition, because firms should merge with undervalued firms, the assumption of an equilibrium should have been violated.  Also, if the firm were to go bankrupt, then the distribution of prices would not matter, only the probability from the Bernoulli process that the future quantity was equal to zero.

Do note that such certainty is not a real-world problem if the probability of a reward is decomposed into the reward given the firm remains going concern multiplied by the probability it will remain a going concern, plus the probability of a reward given a merger multiplied by the probability of a merger, plus zero times the probability of a bankruptcy. 

From assumption 4, it follows that there cannot be a winner's curse in equilibrium.  The overlap in the limit book would prevent the possibility of a cursed price, so the rational behavior is for each bidder to bid their estimation of the expected price.   From assumption 1 and the central limit theorem, it must be the case that the distribution of the limit book must converge to the Gaussian distribution as the number of bidders becomes large enough. 

From Curtiss, Gurland, and Marsaglia, it is well known that the distribution of the ratio of normal variates centered on zero, or in this case the equilibrium, must be the Cauchy distribution.  Alternatively, if one converted the prices into polar coordinates, it follows that the solution is also the solution to Gull's Lighthouse Problem and again converges to a Cauchy distribution.

From assumption 7 there are other states of nature not included in this proof including bankruptcy which limits losses to the original investment, truncating the distribution in reward space at zero and in return space at negative one hundred percent.  As such, integrating the kernel from zero to infinity rather than negative infinity to positive infinity produces a density of 


is the scale parameter of returns and is the ratio of the standard deviations of prices, making it also a measure of price heteroskedasticity.  It is important to note that the scale parameter is not a variance as neither the population mean nor variance is defined. 

The reason the distribution lacks a defined mean or variance depends, in part, on how the integrals are defined, but in either circumstance, the expectation is 

which clearly diverges.  Although the arctangent goes to unity as the reward goes to infinity, it is obvious that the product goes to infinity as reward goes to infinity, implying an undefined expectation.

This absence of a mean has an unexpected, but well known, result in statistics.  The sampling distribution of an estimator of a mean or of a least squares estimator will map to the distribution of the data.  The implication is that one randomly chosen element of a sample, if used as the estimator of the center of location, has the same informational value as the sample mean of a billion points of data.  If a squares minimizing process or an arithmetic average is used, then no meaningful solution can be found.

With no mean, the models collapse.

Apple as an Example

Using Apple as an example, consider the daily returns.  Rewards were normalized to daily rewards to allow for weekends and market closings.  As such, a value of 1 is the same thing as a zero percent return.


Summary statistics for Apple from R are:

Min 0.4813
1 Qtr 0.9896
Median 1
3 Qtr 1.0113
Max 1.33323
Mean 1.00078


The lifetime range is almost eighty-six percent.  The difference between the mean and the median seems small, but these are daily returns.  The annualized difference is almost thirty-three percent.  The Cauchy distribution, ignoring the truncation at zero, uses the median as the center of location.  The normal distribution’s most efficient estimator is the mean.  Which to use?


A kernel density estimate of Apple’s daily return using the bi-weight method is shown below.


Now the implicit model using the normal distribution is used below.  The normal is in red.  The maximum likelihood estimator was used.  The systematic effects of liquidity costs, dividends, truncation, and uncertainty regarding the estimator were ignored.  The same is true for the Cauchy model below.  It is possible to improve the modeling for both by proper accounting for other effects.


The implicit model using the Cauchy distribution is a substantial improvement but creates a problem.  If it is a distribution without a mean, then least squares methods should not be used.


For many models, the log difference is used rather than the raw data.  The log model does have a mean and variance, but no covariance.  The log distribution is the hyperbolic secant distribution and an improvement in the sense that a mean and variance exist, but not a gain concerning least squares as there is still no covariance structure about which to discuss systematic and idiosyncratic risk.

Path Forward

The news on the path forward is both good and bad.  The good news is that the path forward has yet to be built in an automated format, and so there is a small fortune to be made in creating the design the market ends up adopting.  Someone reading this may get rich.  The bad news is that the path forward has yet to build in an automated format and it won't look like a regression of the style traditional in existing models.  There will be many failures.

Dividends cannot be ignored.  Bankruptcy and mergers cannot be ignored.  Liquidity costs cannot be ignored.  It also requires building across data sets.  If you observe firm X alone in a time series, how will you capture its probability of going out of existence prior to it going out of existence?  If you observe a firm that has never paid a dividend, how will you predict likely future dividends?  The idea of observing a single stationary time series is inadequate.

I am hoping to create a push toward new activity and end discussions of older ideas such as volatility surfaces or WACC as they won't matter anymore.  Many things will vanish.  Alpha and beta will go away.  Factors will likely come back, but without the good fortune of having a covariance structure to work with.

So how to move forward, by beginning with things that are known to work.  It is time to unshackle our minds from the straitjacket of fixating on the elegant.  One of the tools that work is value investing.

Value Investing

I am going to begin this exposition on value investing with a financial story set in Montana.  The story begins decades ago with two brothers meeting, falling in love with, and marrying two sisters.  The two new families purchased homes diagonal to one another on a street corner in Great Falls. 

Great Falls was a planned city.  Founded in 1883 by Paris Gibson and built on the advantages hydroelectric power could provide to an industrial location, it is a study in the history of American architecture.  A drive from downtown shows the slow expansion of the city and the periods where growth happened can be identified by looking at the design of homes on a block.

The two homes the couples moved into were started and finished on the exact same day.  The construction was identical, and the exteriors were identical.  To save money, the two families made bulk purchases when repairs or changes were needed, and the two homes remained identical all through the years.  Both families had one child.  The children, Charlie and Sam, grew up. Sam moved to New York while cousin Charlie moved to Los Angles.  They were building successful careers when tragedy struck.

The two couples loved to do things together and decided to go to Glacier National Park.  While traveling up one of the mountains, their car went out of control and fell hundreds of feet off the road killing everyone instantly.  The cousins returned to Great Falls to bury their parents and settle their estates.

Charlie’s parents had built up an illiquid real estate empire in Cascade County and around Montana.  Sam’s parents were of modest means and except for the home only held highly liquid assets.  The estates settled on the same day, and both cousins listed their homes for sale on the same day.  Both had immediate offers for $200,000, and they immediately accepted them.


The couple that made the offer on Charlie’s home decided sometime later to take a camping trip in Glacier and traveled there for a weekend of fun.  Sadly, the couple came upon the same curve in the road, and they too fell hundreds of feet to their deaths.  Charlie was notified of the deaths by the realtor and was told the couple’s estates were empty and that the sale was off.


Charlie returned to Great Falls to see what could be done as the estate was bleeding cash and decided that it would best be handled in person.  Incidentally, Sam was there for the upcoming closing on the house.  They both went to the old neighborhood to see how things had changed.

Afterward, Charlie went to a bar to find as many pints of Dam Fog from the Mighty Mo Brewing Company as possible.  While sitting at the bar, Charlie talked about the failed sale of the home when someone interrupted and said, “would you take $140,000 for it?  I can’t do more, its what I can get.”  Charlie, ecstatic, cheerfully accepts the offer.  His parent’s estate was asset rich and cash poor.

Sam and Charlie have the closing on the sale at the same attorney’s office at the same time, just down the hall from each other.  They go out for a celebratory drink and promise not to let it become so long until they see each other again.

The new homeowners, Alex and Jessie, were friends and worked at the same industrial concern.  Their homes were identical, and the only difference between the two houses was that Alex paid $140,000 and Jessie paid $200,000, both in cash.

Eleven months passed, and the industrial concern announced a planned expansion.  Real estate prices in the city rose, and the two friends decided to have the homes appraised just to see if they could turn a quick buck.  The appraiser set the value of the houses at $220,000.  Neither was satisfied with the price improvement, but both discussed waiting until the expansion happened and they could downsize if they could get enough money.


Unfortunately for both of them, embezzlement happened at their place of work and the firm was suddenly shuttered.  Unemployed, without immediate prospects, both sold their homes and moved away.  Incidentally, they sold them two years from the date of purchase for $180,000.

Now the question is, did one of them take more risk than the other, and if so, which one?

Because the homes were fundamentally identical and located at approximately the same place, the risk of loss from fire, meteor strike, civil commotion, and so forth should be equal.  The fundamental chance of damage to the structures is the same.














Standard Deviation




The sample standard deviation of prices for Jessie’s home was $20,000 while it was $40,000 for Alex’s home over the period.  As measured by variance, Jessie’s house was the less risky investment.  Was it less risky? 


Consider the following three definitions of risk.  One definition is exposure to loss, the second is exposure to uncertainty, the third is exposure to goal failure.

To make things slightly more comparable, let us add the stipulation that Alex lied to Charlie and actually had another $60,000 in savings so that both have equal assets at the beginning.  Those funds are still in savings.


Imagine that instead of being either Alex or Jessie, we are nature, and we know the true probability distribution of prices at the end of the second year.  Let us assume it follows the following, somewhat strange, ad hoc cumulative mass process.

Market Price in Thousands of Dollars

Probability a Value is Less Than or Equal to the Market Price























Based on the first definition, being exposed to the risk of loss, Alex exposed less money and only had a twenty percent chance of experiencing a loss.  In addition, thirty percent of Alex's portfolio is in a federally insured savings account, and so the variance could be considered zero.  Because the variance in a Bernoulli trial is greatest at the fifty percent mark, Jessie took the greatest risk in terms of the uncertainty of outcome.  Alex's variance is $22,400, while Jessie's is $50,000 when measured as the uncertainty of loss.

What about when measured as exposure to raw uncertainty?

Looking forward, once the homes’ values had gone back to equilibrium and stabilized, both houses had the same exposure to uncertainty at that price level, but Alex exposed fewer resources and still took less risk.


Now consider exposure to goal failure.  Let us imagine the goal was to make a ten percent simple interest rate of return on all investments over the two years.  For Alex, the profit needs to be at least $28,000 plus returns on the cash while Jessie must make $40,000.  If we assume that the above mass function is piecewise linear, then Jessie has a 26.67% chance of succeeding.  Conversely, Alex has a 70.67% of succeeding.

Now consider the case where Alex, a great-great-grandchild of Rip van Winkle, fell asleep at the moment of purchase and woke up just in time to sell.  Alex couldn’t allocate the other $60,000 in cash in other investments, and so the home must make all $40,000 to reach the goal.  Alex still has a 65% chance of succeeding while still holding fewer risky assets.


When does Alex’s risk catch up to Jessie’s risk?  If Alex would make a catastrophic purchase and lose 100% of the investment in the complementary set of assets, then they would have the same risk of goal failure.

Part of the Math Behind Value Investing

Let us go back to the above concepts of present value and future value.



We are still going to ignore liquidity costs, dividends, merger risk, and bankruptcy risk to simplify the discussion, which we clearly should not do in the real world.


Let  be earnings and  be price, note that   Also, note that these are, for our purposes, economic operating earnings and not accounting earnings.  The difference is that the accounting principles are a tool with a purpose.  The tool is a mixture of meeting business needs and the political needs of management, shareholders, and legislatures.  Each nation has its own standard set of accounting rules.


It is common for legislatures to pass laws deferring the payment of taxes for powerful interests.  Consider a firm that entered into a transaction that resulted in a one hundred million dollar tax liability, but where the law allows for the deferral of the payment by ten years.  Real taxes are rarely that simple, but in the US, MACRS is such a set of rules.


If a ten-year eight percent zero coupon bond were available, then, ignoring the tax on the gain, the tax can be canceled by investing forty-six million, three hundred and twenty thousand dollars in the bond and simply waiting for it to mature.  What happened to the other fifty-four million dollars?  It is really equity.  If you can cancel a hundred million dollar accounting liability for forty-six million dollars, then the other fifty-four million is a fiction in economic terms.


Operating accounting earnings are a rules-based specification of how to divvy up operating cash flows between stakeholders such as customers, employees, creditors, and shareholders.  They tend to be less volatile than cash flows, but are still rules driven.  For our purposes, there will be no stochastic component, and earnings will be perfectly representative.


The PE ratio, commonly used in value investing will be denoted as 


The reward on investing can be rewritten as 


Future earnings can be restated as prior earnings multiplied by a growth factor, 



So the reward for investing can be stated as 


In this simplified world, the relationship  makes the sole controllable variable   


Note this is not modeled like an economic theory.  In standard economics,   Standard economics either describes a system at rest, or a system moving toward rest.  There is no system here.  In equilibrium, value investing is called the value trap.  Because prices are properly ordered, no excess gain exists in the system.  However, a system driven by equilibria is a system that will seek to return over time to its equilibrium.  That fact can be seen as an advantage.  Additionally, I have improperly set aside dividends to make life simpler, but the econometrics of dividends will need to be a future blog posting.


The goal of value investing is to purchase assets with the smallest measures of price to value.  It is sometimes mistaken that this would imply the lowest price to earnings, price to sales, or price to book, but those are only markers of value.  A more sophisticated view is concerned with economic value and not accounting measures which can be skewed.


If dividends were added, then return would be a discounted sum of the parts.  This posting has no direct concept of time.  I ignored time by making the growth factor  instead of standardizing it as   If dividends were present, then a more complicated sum would be used, but the lessons would be no different.


Likewise, if real rules of accountancy were used, then this would probably run sixteen hundred pages long.  It would need to contain the content of the 5th and 6th editions, from 1987 and 1943 respectively, of Graham and Dodd's Security Analysis.  Yes, the sixth edition is a reprint of the 1943 edition.  The fifth edition describes how things should be done, the 6th edition describes why it is done.  They really are inseparable.


So, in this nearly perfect world, how to apply the above story?  First ask, “what role does the data scientist play here?”  Is the data scientist the appraiser, the market maker, the trader, the portfolio manager, or several of the above?


The data scientist would move over the set of securities, favoring none.  The Bayesian predictive distribution would need to be constructed of future prices and earnings.  Such a thing includes risk in the distribution, inherently.  The Bayesian predictive distribution is  where  is the sample space.


Predictions need to form on all cash flows in the holding period.  If something increases bankruptcy risk, it decreases the probability of getting a standard return or a return from a merger happening.  In a world with dividends, anything that would make a dividend uncertain makes the return uncertain.  The underlying firm operating risk is a return risk and included in the prediction.  Actual economic and accounting values would have to rear their ugly heads and be included in the analysis.  Given the above good news and bad news, that is the bad news.


Still, we are not at a model, all we have done is spoke about the nature of return and current price.  If we are not currently in equilibrium, that is we are not about to be in a value trap, then prices are not properly ordered.  Most likely, most prices are properly ordered, but some will not be.  Some will be overpriced, and some will be underpriced.


What happens with  is that it can be used to index the predictive distribution as a cumulative density.  Consider an investor that required an eight percent rate of return.  Imagine that if  then there is a fifty percent chance of reaching the goal, but if  then there is an eighty percent chance of reaching the goal.


It converts this problem from one without a mean or a variance, to a multinomial problem, or a problem of minimizing the expected loss from goal failure.


The viewpoint of value investing differs radically from models such as the CAPM.  Each cash flow holds a potential value.  The less one pays for a cash flow, the less risk and the higher return one would expect to receive.  Because of this, misvalued securities should be rare, and they are.


Value investors are looking for errors in the ordering of securities and other investments in terms of price to value.  Earnings were used above, but profits are not always a reliable index both for economic reasons and accounting reasons.  Accounting statements include both stated values and notes to explain the stated values.  Real accounting happens in the notes.  It is also where the valuation process begins.

From the view of data science, it would be a massive undertaking.  However, since the first portion of this blog post points out that factor models, beta based models, and Ito models are intrinsically invalid, it makes sense to use things that have been observed to work.


The first thing I tell students in introductory economics courses is that empiricism says that if something is always observed to work, then do that.  If something always fails, then do not do that.  If something works, contingent on other things, discover what those other things are.  No matter how much you love an idea, model, or thing, if it is not supported in the empirical literature, then do not do that.  Value investing does not span the set of all things that are known to work, but it is attractive because it inverts the risk and reward trade-off compared to that faced in equilibrium.


The data scientist constructing software to find and invest in the disparities that exist between value and price has to begin with accounting and economic data.  The scientist would break up the problem into many small parts, such as accounting for valuation issues created by inventory methods, tax adjustments, goodwill adjustments, and so forth.  Then the data scientist can use these variables to filter the set down to those most undervalued and adjust them for liquidity costs.  I would still recommend having a person read the financial and make the decisions, but that is because I believe that machine learning when combined with highly skill humans can produce far superior results than either alone.



Curtiss, J. H. (1941). On the distribution of the quotient of two chance variables. Annals of
Mathematical Statistics, 12:409-421.

Fama, E. F. and MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. The
Journal of Political Economy, 81(3):607-636.

Fisher, R. (1934). Two new properties of mathematical likelihood. Proceedings of the Royal
Society of London, Series A, 144:285 - 307.

Graham, B. and Dodd, D. L. (1934). Security Analysis. Whittlesey House, McGraw-Hill
Book Company, New York.

Graham, Benjamin, et al. (1988) Graham and Dodd's Security Analysis. McGraw-Hill, 

Graham, Benjamin, and David L Dodd.(2009) Security Analysis : Principles and Technique. McGraw-Hill. New York.

Gull, S. F. (1988). Bayesian inductive inference and maximum entropy. In Erickson, G. J. and
Smith, C. R., editors, Maximum-Entropy and Bayesian Methods in Science and Engineering: Foundations, volume 1 of Fundamental Theories of Physics, pages 53-74. Springer.

Gurland, J. (1948). Inversion formulae for the distribution of ratios. The Annals of Mathematical Statistics, 19(2):228-237.

Mandelbrot, B. (1963). The variation of certain speculative prices. The Journal of Business,

Marsaglia, G. (1965). Ratios of normal variables and ratios of sums of uniform variables.
Journal of the American Statistical Association, 60(309):193-204.

Stigler, S. M. (1974). Studies in the history of probability and statistics. xxxiii: Cauchy and
the witch of Agnesi: An historical note on the Cauchy distribution. Biometrika, 61(2):375-




Views: 465


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service