eddington
package.One statistic that cyclists might be interested to track is their Eddington number. The Eddington number for cycling, E, is the maximum number where a cyclist has ridden E miles on E distinct days. So to get a number of 30, you need to have ridden 30 miles or more on 30 separate days.
This package allows the user to compute Eddington numbers and more. For example, users can determine if a specific Eddington number is satisfied or how many rides of the appropriate distance are needed to increment their Eddington number. It also contains simulated data to demonstrate the package use.
Loading the simulated data is simple. Let’s take a quick look at the first few lines.
library(eddington)
head(rides)
#> ride_date ride_length
#> 1 2009-01-04 5.97
#> 2 2009-01-06 11.59
#> 3 2009-01-10 17.10
#> 4 2009-01-11 12.51
#> 5 2009-01-18 10.70
#> 6 2009-01-18 7.73
First, we need to establish the granularity of the data. As you can see above, there are at least two entries for 2009-01-18. Since this data simulates a rider who tracked each individual ride, there could be more than one ride per day in this dataset. Therefore, we need to transform the data to aggregate on day.
library(dplyr)
rides %>%
days <- group_by(ride_date) %>%
summarize(n = n(), total = sum(ride_length))
head(days)
#> # A tibble: 6 × 3
#> ride_date n total
#> <date> <int> <dbl>
#> 1 2009-01-04 1 5.97
#> 2 2009-01-06 1 11.6
#> 3 2009-01-10 1 17.1
#> 4 2009-01-11 1 12.5
#> 5 2009-01-18 2 18.4
#> 6 2009-01-19 1 13.4
Let’s just take a quick peek at the summary stats:
summary(days)
#> ride_date n total
#> Min. :2009-01-04 Min. :1.000 Min. : 2.17
#> 1st Qu.:2009-04-07 1st Qu.:1.000 1st Qu.:11.37
#> Median :2009-07-15 Median :1.000 Median :15.02
#> Mean :2009-07-05 Mean :1.404 Mean :19.21
#> 3rd Qu.:2009-09-28 3rd Qu.:2.000 3rd Qu.:25.04
#> Max. :2009-12-31 Max. :5.000 Max. :77.71
This plot provides a histogram of daily mileages. Note the summary Eddington number is in dark red—we’ll see how that’s calculated in the next section.
To compute the Eddington number, we use the E_num()
function like so:
E_num(days$total)
#> [1] 29
To see how the Eddington number progressed over the year, use E_cum()
. It can be useful to add the vector as a new column onto the existing dataset:
$E <- E_cum(days$total)
days
head(days)
#> # A tibble: 6 × 4
#> ride_date n total E
#> <date> <int> <dbl> <int>
#> 1 2009-01-04 1 5.97 1
#> 2 2009-01-06 1 11.6 2
#> 3 2009-01-10 1 17.1 3
#> 4 2009-01-11 1 12.5 4
#> 5 2009-01-18 2 18.4 5
#> 6 2009-01-19 1 13.4 5
It might be more interesting to see that graphically:
So now that we know that the summary Eddington number was 29 for the year, let’s see how many more rides of length 30 or greater that we would have needed to increment the E to 30.
E_next(days$total)
#> Your current Eddington Number is 29. You need 3 rides of 30 or greater
#> to get to an Eddington number of 30.
An ambitious rider might be interested to see the number of rides required to reach a stretch goal. Say, how many more rides would have been needed to reach an E of 50? For that, we use E_req()
, which stands for “required.”
E_req(days$total, 50)
#> [1] 46
We could also check to see if we’ve gotten to 30 by using E_sat()
, which stands for “satisfies.”
E_sat(days$total, 30)
#> [1] FALSE
The text above should give you a good start in using the eddington
package. Although this package was developed with bicycling in mind, it has applications for other users as well. The Eddington number is a specific application of computing the side length of a Durfee square. Another application is the Hirsch index, or h-index, which a popular number in bibliometrics.