Posted: Tue Aug 02, 2005 12:15 am Post subject: Verhulst, GBM, et al - Model Runs
Sometimes there's nothing better that sitting down with a data set and pushing it around until it starts to yield it's truths. I've been playing with the oil production history data for a while and thought I'd share some insights, and a forecast. The whole thing has a fairly high level of dodginess about it, but I've noticed a couple of things that I haven't seen acknowledged anywhere else, so, if nothing else, I hope other people can get some new modelling inspiration from the new ideas.
The methodology will take a bit to explain, so bear with me (or just scroll down to the bottom for the 'meat'. Also, I'm just going to post data, so be sure you have your spreadsheet program open if you want little pictures to go with it.
Have you ever plotted a log of production history? I think you should. What does it look like? Interestingly, a rather simple yet striking behaviour is immediately apparent. This forecast proceeds by interpreting the Ln(Production) chart as a series of straight lines seperated by short periods of discontinuity. I fitted straight lines to each subset of the data that appeared to be approximately a straight line. The data, and straightline fits, follow.
Ok, this is probably quite obvious to some, but lets take a brief moment to consider what this means. A straight line on a Log plot is exponential growth (or decay). Thus, interpreting the historical production data this way is equivalent to saying oil production has tended to grow according to a stable exponential trend until something disrupts that trend, at which time, after a short period of adjustment, production resumes growth according to some other exponential trend. The first thing that may jump to mind is that short term forecasts by IEA, EIA etc. always basically follow this principle - Exponential growth in consumption and fix production to match. Well, this methodology suggests that they will be correct, or correct until, at least, a significant discontinuity arrives. For those of us who have spent inordinate amounts of time trying to find stable methodology for finding verhulst fits, this suggests the reason we have had so much trouble at this. If the local behaviour of the curve is exponential, while the local behaviour of the fitting verhulst is relatively flat (concave), it is unsurprising that large variation in parameters yield such small variation in goodness of fit. Anyway, let's continue.
We now have 6 straight lines, and 5 periods of discontinuity. Each straight line can be defined by m and b such that y = mx + b. The lines are:
Code:
m 0.066 0.079 0.084 0.074 0.038 0.015
b -1.831 -1.894 -2.429 -2.376 0.159 1.774
If you want, you can plot the exponential of these lines, and see how it maps on to ordinary production.
The five periods of discontinuity are:
circa 1919-192? (beginning of "Roaring twenties")
circa 1929-1932 (Great Depression)
circa 1937-1943 (WWII related)
circa 1973-1975 (First "Oil Crisis")
circa 1979-1983 (Second "Oil Crisis")
I think these periods of discontinuity are pretty apparent, and find it remarkable that they each map well to a significant geopolitical event. The exception is perhaps the first, which also happens to be the only positive shock. In seperate analysies I took the first shock to be 1919-1920 or 1919-1923. For the forecast below I actually used 1919-1923. Either way, it seems strange to me that WWI produces no obvious negative shock, and that the aftermath should produce such a positive shock, but I don't know my history that well. Oil production certainly seems to behave differently between the two world wars. Maybe someone knows why.
Anyway, we now have a framework from we would like to generate a forecast into the future. We suspect that oil production will continue along an exponential trend, until at some point in the future, when there is some discontinuity (probably big enough to be called a geopolitical shock), after which we may forecast that production will continue along some new exponential trend (until some further discontinuity). Several questions arise. How long until the next discontinuity? How large will the discontinuity be? What new exponential trend is likely to follow the shock? If we can answer these questions, we have a forecast that proceeds indefinitely into the future.
The method I have used to resolve these questions is rather shaky - believe me, I know how many long bows I'm balancing atop one another. If anyone has better ideas for how to resolve these questions, I'd love to hear it. Continuing, the method I used to resolve these questions follows.
Whatever method is used to generate future exponential trends, I suggest that it intrinsically make use of the fact that oil is a non-replenishable resource. This is the core logic behind conventional use of the verhulst curve, and it seems we may be able to make use of that now. I generated a verhulst curve, which becomes a comparison function, whereby the difference between the comparison function and actual production is taken to be related to the probability of shock. Follow? Production grows exponentially until it travels too far from a baseline verhulst curve. As it gets further away, the likelihood of a shock increases, and the shock will bring current production more into line with the comparsion verhulst curve. Ok, continuing.
I fitted a verhulst to historic data. Actually, I prefer to fit the Log of a Verhulst to the Log of historic data, because the latter exhibits relative homoskedasticity. This just means that the variance is pretty similar as we travel along the curve. It's not perfect in this case but it's better than fitting to the raw curve. I use Least Square Error mostly. When I did the fit, the following parameters were resolved.
U=1583
(1/k)=14.68
n=0.96
T1/2=1995
Looking at this fit, I believe we can see easily that it is inappropriate. If we look at the difference between the data and the fit, we can see that it is a it's greatest in year 2004. Logically, we can see that this is an artifact of fitting locally expontential data with a locally concave curve. This is the same artifact discussed earler. Thus, I rejected this fit as heavily biased.
Unsure of how to proceed, I chose to take U as an exogenous variable. Following Campbell and others, I used the ballpark 2000 Gb as U. The choice of U will significantly alter the outcome of the model. I subsequently fitted the remaining variables to log production - log verhulst, as before. The parameters were:
U=2000
(1/k)=14.53
n=1.84
T1/2=2003
Before continuing, note that U will no longer actually equal Ultimate production as in the standard verhulst model. Rather, it is merely a parameter of the comparison curve. The actual Ultimate will almost always be a fair amount greater than U, because the exponential growth trend will mostly trigger negative shocks by being greater than the comparison function (at least in the model that follows). Actual ultimate seems to be about 5-10% greater than U. I usually take the liberty of assuming that Campbell et al are 10-20% too pessimistic, so a U of 2000 is fine here with me (implies actual U = 2100-2200). I can also run this model with any other numbers, so make suggestions. What would be better is a non-biased method for estimation of U. Any ideas? ("I know what we need, a magic bullet!") Continuing.
Ok. Now let's take a look a the Z score for the difference between the log of production and the log of the fitted verhulst. Data are:
I thought this was worth taking a special look. If you plot this up, you can see very clearly the discontinuities. The five shocks at the times mentioned earlier stand out quite clearly. We can take the Z value at the start of shock, subtract the Z value at the end of shock, and take the absolute value of this, to get a shock magnitude. Following this procedure, we get shock magnitudes of delta Z =:
2.51
2.45
2.26
0.99
2.10
Do you remember that one of the things we were going to need to proceed with the forecast was the size of the shock, when it arrived? Well, I used these numbers to approximate the size of future shocks. I know this is a very dodgy procedure with only five data elements, but hey, it's the best I could think of. Better suggestions welcome. In practice, what I actually did was a maximum likelihood estimation procedure to generate parameters for a 2 parameter Wiebull distribution. I choose Wiebull because I needed a one directional distribution, and Wiebull hit highest likelihood amoungst the bunch I tried. With five data elements, it's going to be impossible to distinguish between distributions anyway. Parameters were resolved as:
alpha 5.21
beta 2.26
So later on, I'm going to use that distribution to generate magnitudes of shocks. Check that one off.
Ok. Let's work out when shocks should occur. For those of you who are on the ball, you probably suspected from the fact that I generated a distribution to resolve shock magnitudes that I was going to use a Monte Carlo procedure later on. Well, you are right. Consistent with this, I want to a generate a probabilistic method of determining whether or not a shock occurs. I wanted to use historic Z values as the baseline assumption for what Z values trigger shocks, but have been unable to make this work yet. I am still working on variations to the model which may make this possible in the future. I any case, I simply did what MC modellers sometimes have to do, and rigged up a loose approximation to what kind of Z value will trigger a shock. I compared the current Z value with a normally distributed random variable with mean 2 and SD 1/2. A shock was triggered when the current Z value surpassed the random variable.
Let's look at what we have so far. We have the exponential trend, which we expect to continue until a shock arrives. We have a measure of the likelihood that a shock will arrive in any given time. Also, we have a tentative measure of the magnitude of that shock, were it to arrive. After the shock, we expect oil production to resume an exponential trend. However, we currently have no method of deciding what that trend should be. If you have read this far, then you may suspect that I'll use some dodgy method of approximating this, and you'd be right.
Remember the m and b parameters from the set of straight lines mapped to historical log of production? Well, if you plot these m vs b parameters, you'll see they vaguely allude to a straight line. I obtained the regression equation:
b= -61.93 * m + 2.58
Ok then. Now we have timing of the shock, magnitude of the shock, which will yield the first data point after the shock (all shocks were assumed to be 1 year long for simplicity). Also, now we can find the new exponential trend because we know the relationship between m and b.
I'm sure i've forgotten something, but, it's gone too long already, so, now we'll just take 10000 sets of the above mentioned random numbers, and obtain the forecast below. From left to right, the columns refer to -
If you're still with me, thanks for journeying. I look forward to hearing all the easy ways I can eliminate the incessant and inherent dodginess from the model. If someone wants me to run other numbers, I can do that too. Alternatively, PM me for the spreadsheet (excel). Be warned, however, that the spreadsheet is what I consider to be one of the most superior examples of "Spagetti spreadsheeting" that one will ever witness. A dying art in modern times. I will not be held liable for psychiatric bills incurred in trying to decypher it.
The thing that is hard to predict is future "hoarding" of reserves when the reality of depletion is understood. In the past, oil has been pumped as fast as it could be sold. Post peak, there will be greater motivation to hold back production (while claiming maximum production). Don't know how this could be modeled. _________________ "The world is changed... I feel it in the water... I feel it in the earth... I smell it in the air... Much that once was, is lost..." - Galadriel
Ah, a wonderful model to find this morning. Thanks, Shiraz, for your thoughtful work.
I have one quesiton, though: Why doesn't the p-95 estimate behave like the others? Seems like this one should be below the p75. Also, there is a six year delay in the peak, it looks like.
Firstly, thanks Khebab for your chart. Plenty of info, very clear. Your effort is appreciated.
Pup55, I believe you are refering to the curve of the median, which is the far right column of data. The median behaves far less stably than the other curves, because the discontinuities, when they occur, are somewhat severe. This is not a consequence of the choice of U, it should be noted. This is a consequence of the fact that according to this model, the exponential growth curve will butt up against a concave verhulst curve. This means that the most strenuous shock will be the first shock after the year T1/2 for the comparison curve. This feature is indepentant of the chosen U, or of the set of parameters in general. Peak oil will always be the point of most turbulance when the exponential butts up against the concave verhulst.
Regarding the model T data. I'm sure you are onto something. Good spot. I wonder still, what was the cause of the wealth that allowed such rapid profligation of the motor vehicle. Something to do with the war? Or is it just that the motor vehicle is just a remarkably good product, with plenty of utility and little chance for substitution? (where have we heard that mantra before???)
PS Pup55, compare the mean curve with the curve present in your recent discussion about post peak countries. I can think of no reason why their similarity should not be mere coincidence, but is that similarity not striking none-the-less?
PPS With regard to the last comment, scrap that. I just actually plotted the "experianced based" decline vs the mean "post-decline" of this model. Although they both drop off heavy and recover to a plateau, this similarity appears to be quite superficial, as the scale of the comparitive effects is quite different.
Last edited by Shiraz on Tue Aug 02, 2005 8:28 am; edited 1 time in total
Shiraz, good job! I think a local approach where the exponential fit is stable between discontinuities is the way to go. Global approach implicitely assumes that the underlying random variables are stationnary which is obvsiously not the case.
I will post more cooments later. I have to go now. _________________ ______________________________________
http://GraphOilogy.blogspot.com
Pup55, I believe you are refering to the curve of the median, which is the far right column of data. The median behaves far less stably than the other curves, because the discontinuities, when they occur, are somewhat severe.
I see that now. Thanks for the clarification.
Quote:
PS Pup55, compare the mean curve with the curve present in your recent discussion about post peak countries. I can think of no reason why their similarity should not be mere coincidence, but is that similarity not striking none-the-less?
Either way, it seems strange to me that WWI produces no obvious negative shock, and that the aftermath should produce such a positive shock, but I don't know my history that well. Oil production certainly seems to behave differently between the two world wars. Maybe someone knows why.
The US production during WWII was around 1.5 Gb/year which was almost half the world production. Therefore, the world production was shielded from the effect of WWII.
Quote:
Z =:
2.51
2.45
2.26
0.99
2.10
Can you give more details about this estimates? in particular which years did you use?
I wonder if a local exponential fit on a local sliding window and the analysis of the residuals could be a better indicator of a shock instead of the residuals from a global model (Verhulst). _________________ ______________________________________
http://GraphOilogy.blogspot.com
Can you give more details about this estimates? in particular which years did you use?
Here's a lift from the spreadsheet. As you can see, for the first shock I used 1919 to 1923. The data I posted for the linear fits actually came from an old version of the model, which is why it shows the linear fitting around the dates 1919-1920. It wasn't until I saw the Z values that I came to believe that 1923 might be better recognized as the end of the 'shock'. I don't imagine the difference between these cases will have too large an effect. It is very unfortunate that there are so few shocks to work with. In a perfect world we might be temp