The Nation Security Agency is one of the most secretive intelligence agencies in the world. Charged with collection and analysis of foreign communications and foreign signals intelligence, it has its hand in all kinds of high technological craft and methods for monitoring the enormous amount of data that comes over the internet. And apparently one of their key tools is a simple Google search.
A recently released book, Untangling the Web: A Guide to Internet Research, serves as a how to guide of how to set up your own spy agency using nothing more than your computer. The guide provides a terrifying look into how much critical information is just sitting out in the open on the internet and the power of digital tools to unearth it all.
The NSA did not willingly give up the book. It took a Freedom of Information request by the organization MuckRock, who helps activists by filing FOIA requests for a fee, to unearth the book and have it declassified. The book can be found here.
The guide describes ways to use internet search engines and the internet archive along with other tools. The chapter called Google Hacking however gives up some of the juiciest tidbits however. The guide describes it as such:
"Google (or search engine) hacking" involves using publicly available search engines to access publicly available information that almost certainly was not intended for public distribution. In short, it’s using cleaver but legal techniques to find information that doesn’t belong on the public Internet."
The guide goes on to describe the types in information that can be found using this method. Examples include personal/financial information, userids, computer, or account logins and passwords, private company data, sensitive government information, and vulnerabilities in websites and servers that could facilitate breaking into the site.
Again, the guide stresses that this method is perfectly legal.
The guide provides examples for how to find specific types of information. Want to find an Excel spreadsheet full of confidential information from South Africa? Simply put in the string [filetype:xls site:za confidential], and watch as Google does its best to find any matching spreadsheets from South African domains that have confidential in their name.
Another key way to find sensitive information is to simply search for files that contain login, userid, and password even on foreign websites. These terms apparently appear in English quite frequently so a search for passwords in Russia may go [filetype:xls site:ru password].
Misconfigured web servers apparently provide a host of information ripe for prying eyes. An example of a search for an index of passwords that exploits this loophole goes [intitle: “index of” site:kr password]
Before you go starting your own spy agency, note that the guide warns that handling Microsoft file types should be done with extreme care when downloaded from a not trustworthy source.
The guide also comments on getting the data back after it has been revealed to the public. The conclusion is harsh, "Getting private information "back" is harder than preventing its disclosure in the first place."
First you have to find out what is out there, a process that could take weeks. Then you have to remove the offending information or fix the underlying problems. But that only removes the immediate copy, not the cached copy that is indexed in a search engine database. That copy could persist for "days, weeks, even months," according to the guide.
The next step is to ask Google to remove the sensitive information from its database. But Google is only one search engine. There are hundreds of other ones and they all not are so forthcoming or timely with removing the information.
The final problem is that someone may have already found the sensitive information and thought it was really interesting. They could just copy it to another website and you may be stuck playing whack-a-mole with legal threats to remove the information. To make things even worse, the internet archive may archive the sensitive information requiring you to constantly make requests to remove sensitive information.
Most people have scarcely comprehended the incredible way in which the internet has changed that way information is treated. Traditional notions of privacy can be bypassed or eliminated by sophisticated heuristics and algorithms that search for data night and day without rest. And how anyone can access reams of sensitive information with nothing more than a few keystrokes.