Tuesday, February 19, 2013

Google's fiber leeching caper

Back in 2000, Google only had data centers on the US west coast and were planning an expansion over to the east coast, to reduce latency to end users. At the time, Google was not hugely profitable like today, and were very conscious of costs. One of the biggest costs of the move was duplicating the data contained in their search indexes over onto the east coast. Google had just passed indexing 1 billion web pages, and had around 9 terabytes of data contained in their indexes. They calculated that even at the highest speed of 1 Gigabit per second, it would take 20 hours to transfer all the data, with a total cost of $250,000.

Larry and Sergey had a plan however, and it centered on exploiting a loophole in the common billing practice known as burstable billing, which is employed by most large bandwidth suppliers. The common practice is to take a bandwidth usage reading every 5 minutes for the whole month. At the end of the month, the top 5% of usage information is discarded, to eliminate spikes (bursts). They reasoned that if they transferred data for less than 5% of the entire month (e.g. for 30 hours), and didn't use the connection at all outside that time, they should be able to get some free bandwidth.

So for 2 nights a month, between 6pm and 6am pacific time, Google pumped the data from their west coast data center to their new east coast location. Outside of these 2 nights, the router was unplugged. At the end of the month the bill came out to be nothing.

They continued like this every month until the contract with their bandwidth supplier ended, and they were forced to negotiate a new one, which meant actually paying for their bandwidth. By this time, Google had started buying up strategically located stretches of fiber, paving the way for its own fiber network to support its increasing bandwidth needs.

Source:

In The Plex: How Google Thinks, Works, and Shapes Our Lives [Amazon]
By Steven Levy
Published: April 12, 2011
See pages 187-188, Steven Levy's interview with Urs Hölzle and Jim Reese.


11 comments:

  1. Why didn't they just load the data onto a few hard drives and FedEx it over?

    ReplyDelete
    Replies
    1. I believe this is the practice they use now - but I imagine it was because at the time "free" was a better option than paying someone for it. If you read "I'm Feeling Lucky," there's a chapter where they got everyone in the office to go build as many servers as possible out of as many parts as possible just for the economics of it. It was cheaper to build hundreds of mid-grade servers and just let them die on the rack than it was to buy quality parts.

      Delete
    2. A few? A quick Google search shows that in 2000, top capacities were around 30-40GB, and cost per GB was $10-15. Using a nice happy median of $13, that'd be $13,300 per terabyte. And they had "tens" of terabytes to move. 30 terabytes? That's about $400,000.

      Delete
    3. I doubt they were writing it to /dev/null on the receiving end; they must have already owned the discs needed.

      Delete
    4. That would have involved a lengthy process of configuring several hundred hard disks. One thing Google is big on is remote access and automation. Once technicians setup servers and hard disks at the new data center, they don't get touched by anyone again, apart from swapping out failures. The process of formatting and uploading data onto them was controlled remotely by engineers in Mountain View.

      Delete
  2. Unlikely to be true. They would of had a minimum commit, on a gigabit connection in those days the likely minimum would of 200mbit+, plus loop charges if not in an on-net building, and at minimum cross connect charges if on premise. Thus it woudldnt make sense to unplug the router, also bear in mind that router arent free especially in those days to handle such level of traffic/pps.

    ReplyDelete
    Replies
    1. I must admit its hard to believe, but if you read the rest of the book, it sort of fits with their 'flying by the seat of their pants' mindset in the early days of the company. Not long after this happened, Google ended up buying vast swathes of fiber networks, criss crossing the country - reducing their cost of data transfer to a third or more of the wholesale price.

      Delete
    2. I might not buy "free", but the difference between your minimum commitment and a fully utilized pipe is quite a many dollars.

      Delete
    3. This comment has been removed by the author.

      Delete
    4. I worked for an international telco back in 2001 and the main source of metrics for us was the counters on the customer ports of the Cisco routers. I am not sure that instantanous measurement would have even been considered? Seems very strange.

      Delete
  3. This comment has been removed by the author.

    ReplyDelete