Thursday, 27 May 2010

Predicting Race Times Through Curve Fitting

You will find many race time calculators on the web, which will ask you to enter your time for a specific distance and you will get back the predicted race times for various distances. More often than not, the race times predicted are either too high or too low than what you observe in real life.

Take me, for example. My best 10KM time is 48:48. If I enter this into the runningforfitness.org race predictor, the predicted race times which I get are:

5 KM - 23:26
Half Marathon - 01:47:35
Full Marathon - 03:46:58

Whereas my best times for these distances are

5 KM - 23:06 (predicted race time higher)
Half Marathon - 01:51:23 (predicted race time lower)
Full Marathon - 04:50:26 (predicted race time significantly lower)

These race predictors take one race time and apply a formula to get all race times.

Taking the same race times which I had:


Distance Hours Minutes Seconds Equivalent Minutes
5 0 23 6 23.1
10 0 48 47 48.78
21.1 1 51 23 111.38
42.2 4 50 26 290.43


Table 1: Race Distances and Times

 Now if I take that data and plot it, here's what I get:

Figure 1: Distance Vs Time According To Race Times

The blue curve which you see in the above chart, is a polynomial curve which has been fitted onto the data points (race times). The curve is in fact almost a perfect fit, the R-Squared value being 0.99.  

The formula for the polynomial curve is:

 y = 0.0799x2 + 3.3858x + 5.2075

Armed with this perfect formula, I thought I could predict race times for any distance. So I fired up IDLE and started putting in values:

>>> def CalculateTime(distance):
    time = .0799 * (distance**2) + (3.3858*distance) + 5.2075
    hours = math.floor(time/60)
    mins = math.floor(time%60)
    sec = int((time - math.floor(time)) * 60)
    print str(hours) + " hours " + str(mins) + " minutes " + str(sec) + " seconds"

    
>>> CalculateTime(10)
0.0 hours 47.0 minutes 3 seconds
>>> CalculateTime(12)
0.0 hours 57.0 minutes 20 seconds
>>> CalculateTime(21.1)
1.0 hours 52.0 minutes 13 seconds
>>> CalculateTime(42.2)
4.0 hours 50.0 minutes 22 seconds

Everything looks as expected. So I thought, let's predict other times, say for a 50K...

 >>> CalculateTime(50)
6.0 hours 14.0 minutes 14 seconds

That seemed reasonable. So what about shorter distances?

>>> CalculateTime(5)
0.0 hours 24.0 minutes 8 seconds
>>> CalculateTime(3)
0.0 hours 16.0 minutes 5 seconds
>>> CalculateTime(2)
0.0 hours 12.0 minutes 17 seconds
>>> CalculateTime(1)
0.0 hours 8.0 minutes 40 seconds


Woah! According to this model, I will be doing my 5K at a slower pace than my 10K and if I were running a 1K, I would be walking?!!

Not satisfied, I tried fitting other curves -- exponential, power, linear. But none actually gave me reasonable predictions across distances.

The polynomial curve was the only one which gave the relatively better predictions, but only for distances greater than 10KM.

The only place where I can use the polynomial formula, would be to define a lower limit of performance in a race or, in other words, the least expected time.

Which leads me to the conclusion that you can't reliably predict race times through mathematical formula. What a waste of time!