The full code and documentation is available on Github.

What is a DCF?

If you were offered $1 today or $1 a year from now, which would you take? A smart businessman would take the dollar today. Why? Because the present value of the future $1, is actually only around 90¢ after adjusting for inflation and other economic factors. The idea that money in the future is worth less in the present is often referred to as the time value of money, and it is the fundamental principle behind discounted cash flow analysis.

The share price of a company is equal to its present value divided by its total number of outstanding shares. If it were possible to know both how much a company will be worth in future and how much that future value is worth in the present, it would be possible to determine what the share price of the company should be right now (fair value). In reality, there are only two ways of knowing either of these for certain: you are God or you are cheating. The purpose of discounted cash flow analysis is to estimate the fair value of a company by approximating its future free cash flows and discounting that amount by some reasonable rate to compensate for inflation, risk, and the passage of time. If this estimated fair value of the company is less than its current share price, the stock is likely overvalued and represents a sell opportunity and if the fair value is greater than the current share price, the stock is likely undervalued and represents a buy opportunity.

Determining Free Cash Flow

It turns out that free cash flow is a very well defined concept in financial circles. Free cash flow measures how much cash a business has after subtracting out expenditures, taxes, and investments. Therefore, determining future free cash flow simply requires forecasting future revenues and subtracting out future expenditures, taxes, and investments. I developed an automated forecasting engine that utilizes a series of neural networks to predict the future free cash flows of any publicly traded company.

Give me data or give me death

Neural networks are useless without training data. Luckily, the Securities and Exchange Commission (SEC) publishes massive datasets containing every financial document from every publicly traded company. I developed a SQL script that parses these datasets, extracts out useful information, and stores it inside a relational database from which I can efficiently store and effectively query the data. Once I had scraped and stored the data, I utilized the Hibernate ORM framework to map my Java models to database tables and to gain access to the powerful Hibernate Criteria API.

Too much of a good thing

My SEC parser ended up working much, much better than I expected. I ended up with 18,105,547 data points for 9,303 unique companies spread across 450 industries. In order to deal with the massive volume of data, I developed an algorithm to filter out the less important data points. This algorithm first categorizes companies by industry (using their sic codes) and then determines which data points appear most often in financial statements for companies within a particular industrial sector (e.g., “Oil and Gas Revenues” appears very often in financial statements for Crude Petroleum & Natural Gas companies, so it is considered to be an “important” data point for companies in that sector).

Designing the neural networks

The automated forecasting engine is composed of a collection of neural networks, each of which outputs the predicted quarter-over-quarter growth rate of a different component of the free cash flow formula using the “most important” data points as inputs. This quarter-over-quarter growth rate is then multiplied by current actuals to predict future values. These networks were then trained via backpropagation on historical SEC financial data.

Preliminary results

The forecasting engine worked surprisingly well! I trained the engine on sic code 1311 (Crude Petroleum & Natural Gas) and then used the engine to forecast the future free cash flow of Apco Oil and Gas International, Inc. By applying some of the same optimization techniques that I used when computing revenues and operating costs, I believe I can substantially reduce the errors in some of the other measurements.

Term	Predicted (MUSD)	Actual (MUSD)	Error
Revenues	45.39	46.34	-2.05%
Operating Costs	35.85	35.05	2.28%
Taxes	4.14	4.48	-7.59%
Investments	4.76	5.81	18.07%
Assets	87.43	76.25	14.66%
Liabilities	47.25	38.69	22.12%

Determining the Discount Rate

Work in progress! Check back for updates.

Cover photograph from Nepse Forum.

Automated DCF Analysis

Predicting future cash flows