When can we rely on empirical modeling?

In this post, I will discuss about eligibility of a phenomenon for empirical modeling. In other words, I will try to answer a question

 “What kind of phenomenon can be studied through empirical modeling?”

Determinism

Not all phenomenon imaginable are amenable for empirical modeling. For example, if a phenomenon is (totally) deterministic in nature, we cannot (and need not to) do empirical modeling. What do we mean when we say that the phenomenon of our interest is deterministic? It means we can explain, know or predict this phenomenon using observable data with 100% certainty (or with zero error).

How would such a phenomenon be explained? We can just use a deterministic function like

$latex Y_t = f(t), t \in T&s=2 $

(1)

where, $latex Y_t&s=1$  is an observable variable under study representing the phenomenon of our interest and t stands for time. The function f can take any form such as

$latex Y_t = \beta_0 + \beta_1 t&s=1$

(2)

$latex Y_t = A e^{\alpha t}&s=1$

(3)

$latex Y_t = \gamma_1 sin(\delta_1 t) + \gamma_2 cos(\delta_2 t)&s=1$

(4)

When $latex Y_t&s=1$  is observed as shown in Figures 1-3, one would not need more than few lines of algebra to summarize the phenomenon by a straight line using only two numbers: $latex \beta_0=y-intercept&s=1$ and $latex \beta_1=slope&s=1$ [equation (2)].

Figures 123

For Figure 1,  $latex \beta_1 = 0.5&s=1$ and $latex \beta_0 = 1&s=1$

For Figure 2,  $latex \beta_1 = -0.8&s=1$ and $latex \beta_0 = 8&s=1$

For Figure 3,  $latex \beta_1 = 0&s=1$ and $latex \beta_0 = 5&s=1$

With $latex \beta_0&s=1$ and  $latex \beta_1&s=1$ known, we can explain all the observations on $latex Y_t&s=1$ with 100% certainty. Moreover we can also predict $latex Y_t&s=1$ for $latex t=11,12,13,…&s=1$ with same level of certainty. If we know that the relationship between $latex Y_t&s=1$ and $latex t&s=1$ is linear, we need only two observations on $latex Y_t&s=1$, not 100 or even 10.Figures 45

Similarly, if $latex Y_t&s=1$ is observed like Figure 4, equation (3) would be enough to know anything about $latex Y_t&s=1$ that can be known through observation. With some algebra we can get the values of  $latex A&s=1$ and $latex \alpha&s=1$.

For Figure 4, $latex A=2&s=1$ and $latex \alpha=2&s=1$.

Likewise, if $latex Y_t&s=1$ is observed like in Figure 5, equation (4) would be enough to know anything about $latex Y_t&s=1$ that can be known through observation. With some algebra we can get the values of $latex \gamma_1,\gamma_2,\delta_1,\delta_2&s=1$.

For Figure 5, $latex \gamma_1 =\gamma_2=2,\delta_1=\delta_2=0.5&s=1$.

Hence, phenomena such as shown by Figure 1-5 cannot and need not be understood using the techniques of empirical modeling. Mere use of deterministic algebra is enough.

But the pertinent question at this moment is

Do we have any natural or social phenomena that show 100% deterministic pattern?”

The answer to this question is: I have not found a single such phenomenon from any field (if readers can find one such example and demonstrate it, I would be really grateful to her/him). Even the laws of physics, when observed, show non-deterministic patterns. Be it motion of planets, laws of thermodynamics or E = MC2, they all show some (if not much) unpredictability or non-deterministic patterns when observed.  The non-deterministic pattern emerges either because

  1. we do not know all the factors affecting $latex Y_t&s=1$,
  2. we cannot measure $latex Y_t&s=1$ accurately enough with existing technology, or
  3. $latex Y_t&s=1$, by nature, is partially non-deterministic.

In social science and economics, the non-deterministic feature is undeniable, obvious and everywhere. So, what does this suggest?

Almost (I am not sure if I can say all) every (natural or social) phenomenon is amenable for empirical modeling.

Is total chaos a possibility then? 

Let us try to think about the other extreme. Can we imagine a phenomenon, which when observed as $latex Y_t&s=1$ is totally non-deterministic or chaotic? That we cannot at all explain or predict it.

Frankly, I have not been able to imagine or draw one, let alone express it algebraically!

Let us conduct a thought experiment. Close your eyes. Try to draw something, which is as chaotic as possible, in a sheet of paper. Open your eyes and check if there are any patterns there. Most probably you will find patterns all over the place.  Being 100% chaotic is more difficult than being 100% deterministic! Deterministic patterns are at least imaginable and can be expressed with algebra, if cannot be observed. But 100% chaotic patterns are impossible even to imagine (at least for me). Our imagination is built on regularities.

Moreover, if a phenomenon lacks any regular (deterministic) component, there remains nothing for us to learn from observations (data). If you disagree, please make comments below.

Chance-regularity patterns

What is the lesson to be learned here? All the phenomenon (natural or social) are neither 100% deterministic nor 100% chaotic. Instead, all the phenomena contain both features in it. Such phenomena, when observed, are said to have shown chance-regularity patterns.

Hence, any phenomenon that show chance-regularity pattern are amenable for empirical modeling. But, how is it possible that a pattern has both chance and regularity components at the same time? Don’t they contradict each other? I will try to make the answer as easier as possible.

A chance-regularity pattern means that the pattern shows chance behavior (unpredictability or chaos) at individual observational level. On the other hand, we can detect (sometimes with great difficulty) some regularity at aggregate level (finding it is one of the most important challenges of empirical modeling).

We sketched 100% deterministic patterns, but I have no way to draw a 100% chaotic pattern. Fortunately, we can easily draw a chance-regularity pattern. Figure (6)-(9) are the examples of chance-regularity patterns. In all these patterns, both chance and regularity components are visible.

See Figure 6. Can we predict, with 100% certainty, the 101th value of $latex Y_t&s=1$? Or, let us cover last 50 observations and try to predict the 51th observation by looking at first 50 observations? Can we explain these observations using a deterministic functions like (2)-(4) with 100% accuracy? Obviously no.

Figures 6-9

Let us take the example of a coin toss experiment. Can we predict the outcome of the next coin toss with 100% accuracy by observing last 10,000 or even a million tosses? No. Yes, this is the chance component inherent in all these phenomena, including the coin toss experiment.

Similarly, think about Figure 7 or Figures 8-9, there is no way we can predict the next observations by looking at last observations. And we cannot explain whatever is observed by a deterministic functions like equations (2)-(4).

Now, what about the regularity aspect in these patterns (Figures 6-9).  Try to find the regularities in these data yourself first.

Ok, now let me try Figure 6. As can be seen, the observations move up and down around a constant value 0. In other words, the observations seem to come from the distribution whose mean is constant at 0 (I will explain what we mean by “comes from the distribution” in later posts). Similarly, the volatility (can be measured by variance) seems to remain same across t.

In Figure 7, mean of observations seems to be linearly trending upward and the variance around this mean looks constant across t. Compare this with observations in Figure 1.

In Figure 8, we can clearly see the regular cyclical mean of the data and again the constant variance around the mean. Compare this with observations in Figure 5.

Figures 6-8 were all simulated data.

Figure 9 depicts quarterly three months US treasury bill rate from 1948 to 2014. Do you see chance-regularity pattern in it? Are there any chance component? To check, try to explain all the observations using a deterministic function or try to predict the next observation by looking at the last observations. If you cannot do it, that means it has chaotic or chance component.

Now what about the regularity component. Obviously, there is an upward trend till 1980 and downward trend since then. There are also cyclical patterns. When things are moving up, they tend to persist for some period and vice-versa.

Finally, in the coin toss example, even though we have no clue on what the next outcome is going to be, we can always guess around how many heads will appear in 100 tosses. If we increase the number of tosses to 100,000, we are going to be much more precise (this feature in our guessing strategy is called consistency in statistics). These predictability features in the data indicates the existence of regularity in the outcomes of the coin toss experiment.

To conclude, one can check every data set available if they satisfy the chance-regularity criteria. If the data has both of these, then we can rely on empirical modeling to learn from the data. If the data lacks either one of them, we cannot (or need not) use empirical modeling. In coming posts, I will show how to model all these data series, simulated or real.

Practice: Think about following phenomenon and determine if they might show chance-regularity patterns.

  1. Number of tigers poached every year in Chitwan National Park, Nepal.
  2. Real GDP of Nepal in the last 50 years.
  3. Percentage of atheist population in the countries in year 2011.
  4. Age and height of students in my class.

(In the next post, I will talk about Random Variables and Stochastic Processes with some really good examples.)

Niraj Poudyal, PhD

Reference

Spanos, A. (1999), Probability Theory and Statistical Inference: Econometric Modeling with Observational Data. Cambridge University Press. pp: 1-30.

3 Replies to “When can we rely on empirical modeling?”

Leave a Reply

Your email address will not be published. Required fields are marked *