Re-encoding media files

At work, we’re updating old audio files (uncompressed WAV format) into more modern audio formats. It takes about a second per recording, and we have several million recordings. This means our conversion process will take over a month. These files are big, and the output is on a distributed filesystem: they get copied to many of our servers. So we don’t want to speed the process up too much if it will strain our critical servers.

This gives me some insight into YouTube’s reluctance to embrace additional media formats, even Google’s pet project WebM. Like us, they are sensitive to hardware constraints. Even with virtually unlimited money and resources, it takes time to move gigabytes from one disk to another. Google gets around this by splitting data across many servers, so rather than touching a gigabyte on a single disk you move a few megabytes each across several servers. But when all their data needs to be updated, then every server needs to move gigabytes. So Google has no advantage over an average Joe trying to restore his hard disk from backup. Even if Google could build a new datacenter just for video processing, they’d still have to move all the data out of the existing datacenters into the new one and back.

My guess is that YouTube is using its existing video servers to create WebM versions of its videos, while continuing to serve those videos to users. But it will take months, if not years, to get to a point where they can serve WebM content to users.