God’s plan No. 2.
Please read Part I of the same title to get the background for this piece. Click Empirical Modelling: Reading the mind of God: Part I.
So we were talking about God’s plan No. 2. Just to remind, I repeat the plan as
Observe the data as a truly typical realization of God’s tossing his coin of fate. The coin could have turned Head or Tail. But we observe one particular outcome: either Head or Tail. In time series observation, we observe one and only one outcome although we know about all possible outcomes that could have been the case. Using this observation, we draw conclusions about God’s mind. For example, if we observe 49% Head, we may conclude that the God is actually tossing a fair coin.
The HT Universe
The god is interested in creating a random universe of H (Heads) and T (Tails) instead of electrons and quarks. The god tosses 20 coin, which he has already created for the purpose of creating a universe of H and T. Suppose we have a universe thus created as below (we call it the HT universe):
The HT Universe: H H T H T T T H T H T T H T T H H T H T
We can always transform H to 1 and T to 0 so that our digital computers can chew the data we observe. If we do that our universe would look like below to a computer:
The HT Universe for a computer : 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0
Either the god created this single universe by tossing the coins 20 times once or the God repeated the same experiments almost infinitely many times and we happen to be living within this particular HT universe. Either way, the only way we can learn about the coin and/or the creator of the coin is by learning from this observed data. Logical deductions towards truth is insurmountable, vulnerable, unreliable and treacherous path.
The GDP Universe
Soon, god gets bored with the HT universe and decides to create another universe now. We call it a GDP universe. To create this universe, the god has already created a coin that has infinitely many sides each side has a positive real number in it. The number on a particular side represents per-capita real GDP of Nepal since 1960. The outcome of this experiment looks like Figure 1.
Figure 1: Real GDP per capita of Nepal 1964-2018. Source: World Bank, 2019. The blue solid line stands for actual GDP per capita, the bold black line stands for historical average of the same and the dotted green line represents third degree polynomial fit of the actual data ($latex GDP per capita = 276.58 – 0.8235 year + 0.0614 year^2 + 0.0015 year^3 $).
Just like the HT universe, we can view this GDP universe, either as a unique universe created by God or one of the infinitely many that were created but we end up being in this particular universe.
The most interesting ontological question that the inhabitants of this universe can ask is
“What kind of coin was used by the god to create the universe we observe?”
OR
“What kind of god could have created the universe we observe”
There is a slight ontological difference between these two questions. Knowing the coin may not coincide with God. But if we assume that God is the one who can create coins and uses all it can to create the coin, then learning about coin would shed light on almost the true intentions of god.
Epistemology
Can we learn about the intention of God behind each and every observation of the coin toss; be it the HT universe or the GDP universe? How would we know? If the god does not change his intentions from one toss to another and the difference in outcomes are by shear chance, then going behind the each outcome involves a lot of redundant efforts. The best strategy would be to find a common structure (probabilistic or deterministic), if present, in the observable universe we are studying. While finding such a structure, the following two criteria would have to be passed through
- All the raw information in the observation set must be utilized while finding the common structure across the observation set.
- The deterministic component of the common structure (we can call it a trend) should be adequate enough to explain the variation in the observation. In other words, the part of the observation not explained by the trend (we can call it residual) must be truly typical realization of a random process. The residual must not have any systematic information in it that remains to be explained. The trend should capture all the systematic information making the residuals totally unpredictable. This means we decompose the observation (or data) into a systematic component (trend) and a random component (residual).
observation => common structure = trend + residuals
If the observation does not have one of these components, the phenomenon will be deemed not amenable for learning. See more arguments on this at When can we rely on empirical modeling?
- Make sure that the residuals do not have any systematic component of the observations in it. In other words, the trend should be able to capture all the systematic component in the observation.
Let us try to summarize observations in both of the universes.
Learning about the HT Universe
Just counting the heads and tails would give us
9 H and 11 T
From this we can infer that the observation we have comes from a fair coin and a fair god who is not biased against a particular outcome T or H. Of course, we might hesitate to conclude like this until we have 10 H and 10 T. But our intuitions immediately tell us otherwise if we had observed 2 H and 18 T. No one would easily claim fair coin and fair god when the observed universe is skewed in such a scale.
Once we prepare our observations so that our computers can crunch them, we can do many creative things to find the common structure.
Using computers, we could have added all the 0’s and 1’s and divided by 20. We can call it the mean of the observation. The mean would have been 0.45 for this example universe with 9 H and 11 T. We expect the mean to be 0.5 if the coin and the god are perfectly fair. If there were just 2 H, the mean would have been 0.1. The table below shows the expected mean value for some of the possible universes
Table 1: Some of the possible HT universes
Universes | Mean |
0 H ; 20 T | 0 |
1 H ; 19 T | 1/20 = 0.05 |
2 H ; | 1/10 = 0.1 |
10 H ; 10 T | ½ = 0.5 |
5 H ; 15 T | ¼ =0.25 |
20 H ; 0 T | 1 |
By looking at the mean, we can guess whether the coin is fair (mean=0.5) or biased. If biased, what is the direction of bias (mean <0.5 or mean >0.5). If mean > 0.5, then the coin is biased towards turning out H and if mean<0.5, the coin is biased towards T. The severity of the bias could be formally tested using standard Fisher’s testing procedure.
Learning about the GDP Universe
Learning about GDP universe wont be as easy as learning about the HT universe. First, the set of all possible outcome in this universe is infinitely bigger. GDP per capital could have been anything from 0 to infinity. Second the consecutive trials in this experimented universe does not seem to be independent and identical. Not independent in a sense that GDP per capita of this year seems to be dependent on previous year’s GDP per capita. Not identical in a sense that the data has a detectable upward trend indicating that the GDP per capita in each year is being created for changing economic, demographic and technological conditions.
Now how can we learn about observations from such a “complicated” universe as this one. Let me try something using the epistemological criteria I have summarized above. Let me use this whole data set on GDP per capita to dig out the common structure in the observation.
I will try to find a trend first and then will subtract this trend from the observation to obtain the residual. One easy thing that can be done in this regard is computing the mean of the GDP per capita and see if this comes close to being the trend and the resultant residuals come close to be devoid of any systematic information contained in the observation.
Computation of mean value requires assumptions regarding the data generating process (the coin) of per capita GDP. I will try two methods here to demonstrate the idea.
- Assuming that the exact same coin is used each time (mean=constant). Let me call this trend mean $latex c$. Arithmetic mean of the all available values will give its value which is obtained by adding all the values and dividing the sum by the frequency of observation.
- Assuming that each successive coin is printed with slightly higher value (mean = f(year)). Let me call this trend mean t. Further assuming that mean $latex t$ is a polynomial function of year of degree 3. That is mean $latex t = a year^3 + b year^2 + c year^3 + d$. Using a computer, we can choose the value of a, b, c and d (called parameters) so that the distance between actual GDP per capita and its mean $latex t$ is minimized.
The mean $latex c$ value of the per capita income turns out to be 397.445 USD. What does this mean tell us about the coin used to toss the values of GDP since 1960s? Actually nothing. This contains no useful information about the coin tossed. This value systematically either underestimates or overestimates the true value of per capita GDP (see the dark bold horizontal line in Figure 1 which represents the mean value of the data).
The mean $latex t$ value of the per capita income turns out to be $latex 276.58 – 0.8235 year + 0.0614 year^2 + 0.0015 year^3$. What does this tell us about the coin used. A lot more than what mean c can tell.
mean c and mean t can also be called models of our universe. Models are our perspective/explanation of the true universe/reality. Models (our perspectives) can be different for different observers, but the true universe is one. And our epistemological pathway should lead us towards an ever more accurate model (explanation) of the one true universe. Among our two models, one must be at least slightly better than the other, in terms of explaining the one true universe.
We can see the inadequacy of mean c or mean t as a representation of information contained in actual data by checking if the residual has any remaining systematic information left. In other words, to see their adequacy we check if the residuals obtained contain any leftover systematic information. The residual is given by
residual = observation – trend
As we can see residual contains the net information. If the trend captures all the systematic information from observations, the systematic information will cancel out and the residuals will be left with only the random information. If this condition is satisfied, then the trend is the adequate explanation of the observation or the common structure. Let us plot both the residuals
$latex residual (c) = observations – c = GDP per capita – 397.445$
$latex residual (t) = observations – t = GDP per capita – 276.58 – 0.8235 year + 0.0614 year^2 + 0.0015 year^3$
Figure 2: Residual of mean c model and mean t model respectively in blue and black color.
As we can see, the residual of mean c model, the stuff of the universe that our mean c model could not explain is the exact replica of the original GDP Universe. What does this model tell about the universe we want to learn about? Actually, nothing. All the systematic information that we could detect in the original GDP Universe could not be explained by our mean c model and appears as leftover in our residual. Our epistemological conclusion is that our c model is inadequate to explain our GDP Universe.
On the other hand mean t model is significantly more adequate to explain our GDP Universe as we observe it. The residual of this model looks totally different from the data on our GDP Universe. The residuals look chaotic and random compared to the original series on GDP of Nepal. That means the seemingly systematic pattern of the original GDP series are contained in our mean t model. In this sense the mean t model is significantly more adequate than mean c model. This can also be seen in Figure 1 where the explained (systematic) component of the observed GDP Universe is plotted against the observed values of GDP.
Does this mean that mean t model is the best model to explain the GDP Universe we observe? Absolutely not. We can obviously find better models that will give us even more random residuals. There lies the beauty of the journey towards knowing our one true universe.
Conclusion
The one true Universe can have many observers. Not all observers have equally adequate explanation/model of the reality as they observe it. Some models are outright wrong and at least inadequate. Some models are better and no models are 100% accurate as we come closer and closer to reality through observation and model building to account for the reality as we observe it. The notion that all the observers are correct in their own way is an epistemological disaster that we must avoid. The cost of not doing so is high in terms of the distance between what we observe as reality and what stories we can develop to account for our observations. Being able to predict accurately is not as important as being able to scoop important systematic information in our observation so that the leftover residuals are truly random.
(I know this conclusion is very short for our journey. But cannot wait to share what I have now)