Category Archives: Mathematics

The Mathematics of Recurrent Saving

This may give you a good idea of just how much you can expect out of that 401(k) contribution:

If you invest a recurring principal p on a yearly basis into an account with an (r-1)*100% APY (e.g. r=1.05 for a 5% APY), your return after y years is: p * (r^(y+1) – r) / (r – 1).

(For y >= 1, since we’re starting at the first compounding).

So if you put $5k a year into a 401(k) with 3% interest, you’ll have $59038 by the end of the 10th year, vs. the $50000 you’d have without interest.

After 20 years, you’d have $138,382, vs $100,000.

If you contributed $10,000 per year for 20 years, you’d end up with $276,764, vs. $200,000.

Worth it? You decide. But that shocking “you’ll have a $500k nest egg after 30 years” claim, while true, is only true because it’s counting the principal you’re investing.

Granted, locking it away does remove the temptation to spend it.

More generally.

Remember σ(pn) = p * σ(n) + σ(n / pα)?

More generally, σ(pkn) = pk * σ(n) + (pk – 1) / (p – 1) * σ(n / pα).

Again, p is a prime and alpha is its multiplicity in n’s prime factorization. σ is the divisor function, of course.

All I need to do now is figure out how to extend it to a composite number and I’ll have a complete multiplicative recurrence on the divisor function, which I can use to obtain a closed-form rate of growth. I’ve empirically calculated it to grow at approximately 1.6449*n, but my goal is to obtain a tight worst-case bound. I could not find anything special about this number, except that is the 90% critical value of a normal distribution.

Here’s a messy Maple worksheet containing the derivation (among a whole bunch of stuff not related to the derivation that I was experimenting with today).

Conjectures

Do highly composite numbers of the form n^2-1 occur infinitely often? What about when n is prime? Most values of n for which this holds seem to be. (Which makes sense, because if n is highly composite, n+1 is going to be deficient (because n’s prime factors will all go away), and squares of primes certainly fit the bill).

(Actually, can we always say that a highly composite number + 1 is either prime or the square of a prime? All of the values I checked were.

Things start to change around 5040…)

Some financial risk measures simplified.

I just waded through a bunch of papers on the STARR and Rachev risk ratios. These seem to have vastly overcomplicated what you actually need to do, which is unfortunately typical of many mathematical papers (it’s a consequence of the logicians outnumbering the intuitives). I think I finally figured out what they were trying to say, and it turned out to be simple. I might still be wrong about this (after all, it was only an hour or so of deciphering), but here’s how I think the concepts can be summed up easily and intuitively for someone who understands a bit of statistics:

First, get a distribution of excess return by subtracting your returns from the value of a risk-free investment.

Value at risk: Find the qth percentile/quantile of the return; that is, q% of the time, you’ll make less than the value you find. If you assume a normal distribution, you’re just finding the z-score from the p-value q (hint: the Excel NormInv function will do this for you, or you can grab a normal table if you’re old-fashioned). If q = 0.05, that’s 1.96 standard deviations below the mean using a two-tailed test (one-tailed, it’s a bit less).

Conditional value at risk/Expected shortfall/Expected tail loss: Average of everything in the distribution where returns fall below the value at risk. Since the value at risk represents bad things with an unlikely probability (q%) of occurrence, this is the average of all of the really bad, really unlikely things that could happen to your portfolio. Oh, and it’s a loss, so if you’re dealing with a distribution of returns, you’ll want to negate the result.

STARR ratio: Excess returns over expected tail loss. It seems to measure how much you typically make vs. how much you can possibly lose. Average over worst case.

Rachev ratio: Tail loss of losses (just negate the returns distribution!) over tail loss of returns. Loss of losses (let’s call it “gain”) is a good thing, so the numerator represents what you can gain in the best q% of cases, while the denominator represents how much you can lose in the worst q%. Essentially, you’re judging the best possible case against the worst.

When you get down to it, the intuition behind the concepts is simple. It’s just dressed up funny. Unless I’m totally wrong about this 🙂

You can probably use these sorts of ratios in other fields as well, particularly when the terms “best case”, “average case”, and “worst case” have meaning, such as in the analysis of algorithms.

I need to get back to math…

One consequence of trying to do many things with your life is that certain things get pushed to the side for intervals of a month or two. My mathematical work, however, has been sidetracked for over a year now and I need to get back to it already.

If only I weren’t so busy with all of these freelance projects…

Using tensors for spatial k-means

Using k-means clustering on spatial data requires linearization of the dataset. This causes pixels on the right of one row and on the left of the next to be considered neighbors in a traditional linearization scheme (using a Hilbert curve might be a better idea), which is inaccurate. This stems from the fact that observations are stored as a matrix: observations in the rows, features in the columns. Using tensors, it seems we can store the spatial information as well as any constituent features – the dimensionality of the dataset would then be the rank of the tensor – 1, with the final dimension reserved for features.

That would seem more efficient. Maybe I should see whether it has been done already.

Simplifying the closed form of linear regression

Here’s one that must already exist:

Linear regression is given by the closed matrix form:

θ = (XTX)-1 XTY.

We have a rule we can apply here: (AB)-1 = B-1 A-1
Which gives us: θ = X-1 (XT)-1 XT Y.

But the transpose and its inverse cancel, yielding the identity matrix when multiplied.
This leaves us with:

θ = X-1 Y.

Now, surely there must be some reason that it’s not taught this way. Is it because this requires X to be a square matrix? Can we use a pseudoinverse instead to solve this?

Update: Indeed we can. Now I understand why it’s presented that way; the pseudoinverse is given by (XTX)-1 XT! Therefore, we can also just say the optimal parameters are given by X+Y, where X+ is the pseudoinverse.

“Reinvention is talent crying out for background”, I suppose.