We have two variables, `speed` and `dist` referring to how far the stopping distance of the car is at a given speed. We can see in the marginal density plots that `speed` is distributed fairly normally, although with a much flatter peak than a Gaussian, and that `dist` is distributed asymmetrically with respect to its tails.
Let's work with `dist` for now, try out both a normal and Johnson's Su on it. First, let's figure out how to use the `optimize` function in R. This is not a hill climbing algorithm (Apologies! We could make one if you're really curious), it's a more sophisticated search function, but it only operates on a single variable. We give it a
loss function to minimize and a range in which to search:
Code:
x <- cars$dist
optimize(f = function(mu) sum((x - mu)^2),
interval = range(x))
# > optimize(function(mu) sum((x - mu)^2), range(x))
# $minimum
# [1] 42.98
# $objective
# [1] 32538.98
# > mean(x)
# [1] 42.98
What's going on here? We define our X variable as `dist` from the `cars` dataset. We define our loss function as the
sum of squared residuals between all values in X against a candidate mean (`mu`). We tell the search to proceed within the range of our X values. The result, which is found commented out, is the minimized `objective` and its corresponding candidate called the `minimum`. As we can verify, this minimum is the same as the mean!
Let's subtract the mean and then find the standard deviation:
Code:
x1 <- cars$dist - 42.98
optimize(f = function(sigma) sum((sqrt(mean(x1^2)) - sigma)^2),
interval = range(abs(x1)))
# > optimize(f = function(sigma) sum((sqrt(mean(x1^2)) - sigma)^2),
# + interval = range(abs(x1)))
# $minimum
# [1] 25.51038
# $objective
# [1] 0
We should note that this standard deviation doesn't use
Bessel's correction like most standard functions. We found that our `dist` function has a mean of 42.98 and a raw standard deviation of 25.51. How would we do both of these at the same time?
Instead of the `optimize` function, we can use the `optim` function, which optimizes multiple variables at once. I like to do this iteratively, first getting a rough parameterization using quantiles, then a closer fit using something like a
K-S test or
negative log-likelihood:
Code:
x <- cars$dist
qqtad <- function(p) {
sum(abs(p - seq(0, 1, length.out = length(p))))
}
one <- optim(par = c(0, 1),
fn = function(ms) qqtad(pnorm(x, ms[1], ms[2])[order(x)]))
two <- optim(par = one$par,
fn = function(ms) -sum(log(dnorm(x, ms[1], ms[2]))))
# > one
# $par
# [1] 40.18027 25.05558
# $value
# [1] 1.594726
# > two
# $par
# [1] 42.98534 25.51202
# $value
# [1] 232.9012
The first fit is like a baby version of the K-S test. The second fit, we're using MLE estimation instead of a K-S test, but both could work. Notice we found the mean and the standard deviation at the same time! This is absolutely necessary for working with the Johnson Su. Unlike the normal, where the parameters can be found one at a time, because the parameters are simply the moments, this will not be the case in the Johnson Su.
I'll leave it at that for now and come back later to tackle the parameterization of the Johnson Su. There's plenty of meat in this post, but if you can understand and be comfortable working with optimizers, a whole new world opens up! They are a seriously powerful tool in your toolbox for solving problems.