God’s creation as we observe it.
Objective
In this article, I review the methodological pathway from data (what we observe) to the data generating process (the truth) using the narration of “God” being the creator of everything we are interested in and we reading the mind of the God by observing what we observe through measurement. I will discuss the issue in two parts: Part I and Part II.
In Part I, I will formulate the problem of learning from observed/measured reality in the form of data. In other word, I will try to simulate God’s plan using mathematics and computer program. In Part II, I will summarize the methodology to learn from the data about the God’s plan and the potential errors we are likely to make while doing so.
Context
Suppose we have two observable phenomenon of interest A and B and we measure these phenomenon simultaneously using some scales (say ratio scale) at a constant interval (at t = 1, 2, 3, …., T) for a finite number of times (T).
We can say that the outcome of these two measurements, x and y, are particular outcomes of two random variables X and Y.
We can say that {$latex (X_t, Y_t)$, t= 1,2,3, …..} is a vector stochastic process and a finite set of this process: {$latex (X_t, Y_t)$, t= 1,2,3, …..T} is called a sample.
We observe the phenomenon A and B in terms of a particular outcome of this random vector and we call it data and denote it by {$latex (x_t, y_t)$, t= 1,2,3, …..T}.
To give an example, A and B could be a coin and a dice. Tossing and rolling them amounts to random vector $latex (X, Y)$ and a particular outcome of such a trial, say, (Head, Six) is data we have. If we repeat this trial T times we will have a data with sample size T.
To give another example, A and B could be GDP and population of year t = 10 of a particular country. Before we observe them through measurement, they are random variables and once we observe them for a particular year t = 10, we have data on GDP and population. Hypothetically, we can have an infinite number of such ordered trials (called stochastic process). Ordered across t = 1 , 2, 3, ….. When the ordering done across time, the ordering is unique and natural. Changing the ordering will change and distort the information contained in the data. So the ordering has to be taken as given. GDP of year 2019 must come after 2018 and before 2020.
Our goal: Learning from the data
The unobservable stochastic process in these examples amounts to the underlying processes that could have given rise to the coin, dice, GDP and population. Now the interesting question to ask would be:
- What kind of stochastic process (that I cannot observe directly) could have generated the data I observe through measurement? What kind of “God” could have created the world whose GDP and population I measure to know about A and B is this particular value, say GDP = 20 billion and population = 26 million?
Reading the mind of the God
We have no idea about what kind of God or God’s plan could have generated the economy having these particular values for GDP and population. What could have God thought before creating this world? What strategy we could devise to read the mind of the God. We have only two armaments to attack this problem.
- God’s plan No. 1: We can make certain assumptions regarding the God’s mind with or without looking at the data and draw inference/conclusions from these assumptions using logic. Using the conclusion we can make predictions to be tested against the data we observe. If predictions turn out to be fairly reliable, we conclude that the assumptions we made regarding God’s mind were true.
- God’s plan No. 2: Observe the data as a truly typical realization of God’s tossing his coin of fate. The coin could have turned Head or Tail. But we observe one particular outcome: either Head or Tail. In time series observation, we observe one and only one outcome although we know about all possible outcomes that could have been the case. Using this observation, we draw conclusions about God’s mind. For example, if we observe 49% Head, we may conclude that the God is actually tossing a fair coin.
These two pathways towards our attempt to understand God is dealt one by one.
God’s plan No. 1:
Suppose God created a country whose GDP of a particular year, when and as measured, is a realization of a random variable $latex X_t$ that God tosses every year. Of course we cannot observe the outcome of this experiment instantaneously. We need to do tremendous hard-work within the four walls of our statistical heavy giants such as Central Bureau of Statistics and Central Banks.
Under this strategy we have assumed that we do not observe the data as soon as we would like to. So we make assumptions. According to God’s plan, $latex X_t$ (GDP) and $latex Y_t$ (population) are supposed to grow every year in the following manner.
$latex X_t = a_0 + a_{1t} t +a_2 X_{t-1} + u_{1t}$
$latex Y_t = b_0 + b_{1t} t + b_2 Y_{t-1} + u_{2t}$
where $latex Y_{t-1} $ and $latex X_{t-1} $ are lagged values of the same. These terms are put there assuming past year’s GDP and population affects this year’s. Even the God cannot completely ignore the past he himself created while creating the present and the future.
The term t is there to reflect the assumption that GDP and population have a natural trend and grow/shrink linearly irrespective of what happened last year or anywhere else in the economy; like our hair grows, like children grow, like there comes seasons and days come after nights.
$latex u_{1t}$ and $latex u_{2t}$ are the error terms that our other terms on the right hand side of each equation (deterministic part) will never be able to capture, however complete our assumption is with regard to the determinants of $latex X_t $ and $latex Y_t $. Such is the nature of reality. No matter how much we know about our nature, there will always be some random and chaotic part that we will never fully comprehend.
The a‘s and b‘s are the parameters that define the time path of $latex X_t $ and $latex Y_t $. If we know everything about $latex X_t $ and $latex Y_t $ that are to be known, we can get the exact values of these parameters and we will have the God’s plan in our monitors.
We can be as inaccurate as possible at this stage. We will have a mechanism to tell us later that the assumptions we made was wrong before we conclude that the assumptions were in fact true enough.
Now from the God’s perspective, what else he needs to know to toss the coin. Yes the initial conditions: $latex X_0 $ and $latex Y_0 $ (even God has to start somewhere in a finite universe!), the values of a‘s, b‘s and the nature of the random terms $latex u_t$. Let us make following assumptions about these initial conditions, values of the parameters and the random terms as follows:
$latex a_0 = 2, a_1 = 1, a_2 = 0.7 $
$latex b_0 = 3, b_1 = 1.5, b_2 = 0.8 $
$latex u_{1t} \sim N(0,1), u_{2t} \sim N(0,1)$ read as $latex u_{1t} $is Normally distributed with mean 0 and variance 1: this means that these u‘s are zero on an average and hovers around zero symmetrically with a standard deviation of 1. This also means that on an average the 95% of the u’s will fall between -2 and 2. A typical realization of u would look like this when tossed 200 times.
Figure 1: A truly typical realization of the random error term $latex u_t$. Count the number of dots outside (2-,2) interval and find it as a fraction of 200. It should come close to 5%. The value of u is on the vertical axis. Each bubble represents an outcome of the toss.
Now plugging in the values of a‘s, b‘s and $latex u_t $’s tosses the “coins” $latex X_t $ and $latex Y_t $ as shown in following figures to give $latex x_t $ and $latex y_t $.
This is data simulated using R. The God would be able to simulate as many sets of x‘s and y‘s as he wishes. But, in real life, we only observe the data once. And by looking at this single outcome of God’s trial we have to guess what sort of God or God’s plan could have created this data as we observe it. In other words, what sort of equations and what sort values of the parameters of that equation could have given rise to God’s simulated world we encounter everyday.
In the Part II, I will talk about the methodology of learning about God’s plan using the observed data using the same example of GDP and population of a country. That would be the God’s plan No. 2
Will be back!!
It was way too complex for my understanding. I’ll sure to read part II as well. Maybe both, but twice.
Yes. Its a complicated topic even for people with modeling/statistics background. At the same time, my writing is also not very persuasive and transparent and mixing God with mathematical concepts is a challenging task. I just wanted to write down what I was thinking when teaching empirical modeling.