Google Safe Browsing

Google Safe Browsing is a pretty well designed service from the user privacy perspective. The API requires the browser check in after 0-5 minutes after launching and update its local list of truncated hashes. Visited URLs are hashed and locally checked against this list, and if a match is found, the server is queried for the corresponding full hash. If the full hash matches, the user is warned.

This means the only data google gets about you is:

  • When you launch a browser (and later when they re-update the list), your IP address and how long since they updated their list are revealed
  • When a URL you visit matches one of the hashes in your list, which entry in the list you matched is revealed (along with your IP obviously)

The first one is pretty minor, but the second is where things get a bit interesting. Suppose someone (NSA, FBI, CIA, whatever) decided they wanted to know who visits a particular site? Suppose they simply asked google to add a truncated hash of said site to their list, but make the associated full hash not match that site. If google were to comply with such a request, within a few minutes, they would start getting requests for the associated full hash from everyone who visited the target site, and the visitors would get no warnings. Everything would appear exactly as usual, but google would know someone at your IP went there. Since they can make the truncated hashes as short as they want, its easy to make it short enough that they could claim it was simply a hash collision, and everything was working as intended, and no one could prove otherwise.

Given how the NSA tends to act here in the US, and that Google is a US company, I'd be surprised if this hadn't at least come up. I haven't heard anything about it though. Is this an issue, or am I missing something?

Anyway, while I'm here, I might as well propose a fix to the API for this concern: Add a API call to get the URL associated with a full hash, and set a minimum length for the truncated hashes (maybe 64 bits). When you get a false positive, there should be a real malicious site associated with it. For the tracking exploit to work, the malicious site must have the same truncated hash as the target site. If the truncated hashes are required to be somewhat long (instead of any length 256 bits or less), finding/creating a believable matching malicious site URL gets quite hard. If they can't produce such a URL, you know something suspicious is going on (but its also a bit late for your particular privacy). This would not break any existing applications using the API, its purely an extension. This does require an extra service, and would bloat the storage and bandwidth used though.

Copyright © 2011-2013 Craig Macomber