Suppose you want to divide a numeric range (such as 0–1 or 0–23 or 1–365) into even segments. If you know how many segments you have, it's easy; you divide by N. But if you don't know how many segments you will have, and you can't go back once you've divided something, it gets trickier. If you divide into 3 equal segments and need 3, you're at the optimal point. But if you instead need 4 and have already divided into 3 segments, you end up subdividing one segment of length 1/3 into 2, leaving you with 4 segments of length 1/6, 1/6, 1/3, and 1/3.
There's a clever division scheme involving the golden ratio:
- Rescale your range to be from 0–1.
- The ith division occurs at i * φ
It's so simple. Why does this work? I don't know. But it's pretty neat.
I first ran across this when I was looking for a way to pick sample points in 1 year of data. I wanted a set that would be roughly evenly spaced, because I wanted to draw a timeseries chart with the results, but I didn't know how much time it would take to analyze the points. So I analyzed one at a time, using the golden ratio to guide me.
I've been playing with Amazon's S3 and EC2, and they look potentially useful. S3 is the storage system. You pay for storage and transfers. EC2 is the computation system. You pay for virtual computers. Their Getting Started Guide for EC2 is pretty good. It describes step by step how to set up your development environment, then gives you a starter virtual machine to play with. I followed the instructions and got Apache and SSH.
The big advantage of EC2 over running your own servers is that you can get more capacity quickly. In fact it's called Elastic Cloud for that reason. If you're running a web service on a conventional hosting system and then are mentioned on Digg, you're either going to run out of capacity, or you're paying for extra capacity that you're not using most of the time. With EC2, you can monitor for the Digg Effect and add more virtual machines to handle the extra traffic, then release them when the Diggers move on. You only pay when you use the machines.
For my own projects though, I'm never going to get hit by Digg. I was hoping to use EC2 as a cheap low-capacity server. I misunderstood the pricing though. I thought it was $0.10/CPU-hour, but it's actually $0.10/clock-hour. When my server is sitting idle, I'm still getting charged. At $0.10/hour, that's over $70/month, and that's a bit too much to pay for an idle server. I'll instead use my Mac Mini at home.
I might still use S3 for off-site backup. I have regular backups at home, but all the backups are … at home. If anything happens to my home, I lose all copies of all my data. S3 charges for storage, uploads, and downloads, and I estimate that after I upload all my photos, I'd pay $4.50/month. That's pretty reasonable for off-site backup. I haven't investigated whether there are off-the-shelf backup solutions for S3. I want something portable (Linux, Windows, Mac) and command line (so I can automate it). I might end up writing my own quick&dirty scripts for this.
If you're starting a web service, you should definitely take a look at S3 and EC2. They're fairly cheap, and the reliability and flexibility may be worth a lot to your company.