September 29, 2016

Why Has Google Released the Source Code For Two New Hash Functions?

Google has released some of the source code for the new CityHash family of hash functions. In the initial offering Google has published the code, with a friendly MIT license, for CityHash64 and CityHash128. These functions hash strings to 64-bit and 128-bit hash codes, respectively.

64-bit and 128-bit hashes are considered weak by today’s standards and as such Google say that these functions aren’t suitable for cryptography, but do work well for hash tables. The release of this code raises several questions: Why would Google develop new hash functions? Why only 64- and 128-bit? Are there more functions that Google are using and developing? Will CityHash ever be used for cryptography?

On why Google would create new hash functions, the simple answer is speed. Google processes huge amounts of data and every fraction of a millisecond shaved off runtime over heads is essential in keeping computing costs down. Google are claiming that “under real-life conditions we expect CityHash64 to outperform previous work by at least 30% in speed, and perhaps as much as a factor of two”. That is a significant speed boost for Google. What is also interesting is that Google mention optimizing the code for CPUs that are common in Google’s datacenters. This can lead us also to conclude that Google are turning their attention to hashing, indexing and probably cryptography functions using specialized hardware. It is not uncommon today for hackers to use the power of GPUs in cracking codes and part of that work is in the generation of hash tables using GPUs.

As for the other questions, Google call these two functions “a family” of hash functions. Two hardly constitutes a family and in fact Google admit to using “variants of CityHash128” internally. It is most likely that Google have CityHash256, CityHash512 and CityHash1024 tucked away somewhere. If this is so, then these new functions could have a future in cryptography.