Google – Jackson's Blog

Ever wonder how Google spell check, related topic suggestions, or ranking works? I do.

My theory (yes theory, I doubt anyone but the two founders truly know the secret to how Google works) is that Google collects information on how user’s behave, the clicks on a link, the amount of sites pointing to a link, and etc. It uses this information to statistically guess at what the user truly wants based on data that seem to correlate with the user’s behavior.

In the case of a misspelling, instead of doing a performing some sort of Levenshtein-word-distance type check to find the best candidate for the word, Google can simply collect the data on what the user typed after his typo, and suggest that word that has been frequently typed in response to the typo.

Topic suggestions probably work the same way as misspellings, they’ll look at behavioral similarities to suggest content that you might like.

My point is that in this process, Google probably doesn’t need have a slightest idea of what’s in the content that it’s displaying on its pages.

My belief is that initially Google was probably did content parsing to figure out how to sort the content to seed their database, and after their database was well seeded, they collected user behavior and used that to rank relevance.

I came across an article that seems to back this theory of mine, so I decided to post about it today:

http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

I completely disagree with the author on how we can throw the scientific method out of the window now since we have so much data, but I did appreciate the possible insight on how Google does things.

Leave a comment