The Data Skeptic’s Guide To Search Engine Algorithms

Google is so synonymous with its search engine that the name itself has become a verb.

For some, it may be hard to remember a time before search engines, when finding answers meant either flipping manually through volumes of data or making a lucky guess. And because of their 21st-century normalization, many of us take search engines for granted without understanding the complex system behind the clean, white box.

Search engines work in mysterious ways to surface content based on your keywords of choice. But how is the ranking decided? And how much should you trust them?

Crawling and indexing

Google has a nifty, interactive web page that describes the method behind their famous engine. Essentially, it boils down to several actions:

  • Crawling: Search engines crawl the entirety of the visible web to the tune of 60 trillion pages and growing by following links from page to page, and sorting them based on their content and other factors.
  • Indexing: Engines then index crawled web pages, which consists of over 100 million gigabytes. This action is separate from crawling.
  • PageRank: Lastly, search engines like Google’s use algorithms of various factors to decide how to rank content in its index. Factors include quality, relevance and freshness, the algorithms for which — for Google at least — are constantly changing.


The results of a search happen, in Google’s case, in about one eighth of a second. That’s not magic — it’s complex data processing. And it’s far from perfect, hence the constant tweaking of algorithms and data experimentation.

For SEO experts, search engines are shrouded in some secrecy, as the exact details of changes and algorithms are not revealed publicly. Because of this lack of transparency, the task of optimizing websites for search is a difficult one fraught with experimentation and misinformation.

The Skeptic’s approach

There are other issues to bear in mind when considering how search engines work for you. These include:

  • Advertisements: Search engines like Google and Bing place sponsored results in the front page based on queries, based on pay rather than the engines’ regular algorithms. Search users should recognize the difference, and understand that sponsored posts may not be as relevant as their ranking. Google also tailors searches to what it thinks users want based on their search history/
  • Privacy: It’s been well documented that search engines remember queries and collect data. Users should be aware of the potential permanency of searches, the data from which could represent a treasure trove for overreaching government agencies and advertisers. Google is reportedly investing in search encryption.
  • The right to be forgotten: In Europe, a law was passed that allows citizens to request takedowns of personal data from search indexes. Though if the interest of the public overrides that of the individual, delisting is not appropriate, search users are still denied access to content that would otherwise be available through search.
  • Size of the surface: As we’ve discussed previously, websites crawled by search engines only account for a sliver of the entire internet at 19 terabytes. In contrast, the “deep web” consists of 7500 terabytes — vast amounts of information that remain largely out of reach.

The takeaway

When it comes having access to mass amounts of data at the tip of our fingertips, there are a certain amount of trade offs for the privilege. For example, in spite of search engine algorithms, it often still takes digging to find reliable sources amidst growing noise.

Intelligent Googling means, therefore, understanding on a base level what you’re being served up, why, what’s being left undiscovered, and what you’re potentially giving away in the mean time.

Originally published on December 1, 2014. 

We measure success by the understanding we deliver. If you could express it as a percentage, how much fresh understanding did we provide?

Jennifer Markert