Part 1 of 3 of the Series on Search: The Past

I am a “Search Marketer” by trade. Some people know this, but I am often asked what it means. I want to give a quick explanation in this post, as well as some info on how the industry affects your normal “searcher”. It’s fascinating to me, so I hope you agree!

Where Did Search Come From?

The origins of search! If you ever used AltaVista, you got to experience one of the first dedicated “search engines”.

In 2011, searchers used Google Search over 4 billion times a day. I would imagine most of them didn’t think twice about how amazing a product it is, or where it came from. For something that plays such an integral role in our lives, I think it’s important to know a bit about the history and inner workings of the program.

Let’s start with where “search” came from. Many of you probably think Search=Google, and that’s it. The reality is, that has not always been the case, and is only a partial picture of the present.

The origin and history of search is pretty amazing, but I’ll summarize. It can be traced as far back as 1945 to a post-WWII article in The Atlantic called As We May Think by Vannevar Bush. I recommend you go read it, but in short, it calls for a collection of the world’s knowledge into a permanent, easily searchable record. Science was at the stage where it was more work to find and read past research than to produce your own, but unfortunately the former was required for the latter. The old system of books and libraries was no longer working. In Vannevar’s words:

“… the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.”

Keep in mind that this was well before you could search for the latest news on Honey Boo Boo or what happened when X football team played Y football team for the 3,000th time. As is often the case, the needs of science and academia forced a world-changing advancement. Something had to be done, and the web was how it was going to happen.

The Birth of the Web

So, a quick delve into the origin of the web. After a number of smaller advances in web technology, Tim Berners-Lee comes onto the scene around 1991. If you don’t know who he is, go read about him, he created the internet (no, it was not Al Gore)! Before Tim, people were sharing files online via direct FTP, but there was nothing like the “web” we have now. Tim introduced the idea of “hypertext” (links, as we know them today), which he originally intended to be used to share documents among researchers (he worked at a kind cool place called CERN). By combining the “hypertext” and TCP/DNS ideas (complicated internet stuff), he created the world’s first website. And it’s still live: http://info.cern.ch/!

Anyways, tons of people picked up on this “web page” idea. It was mostly colleges at first, since it required large, centralized computing power to run a site. Eventually, there were so many sites, that people needed a way to sort through them. A number of sites tried to do this (Excite, WebCrawler, AltaVista, AskJeeves, etc), but nothing really caught on until the end of 1998, when this happened:

Surprisingly similar to what we use today.

The birth of Google! Now, the creation and rise of Google is a whole post in itself, but let’s talk about why Google was needed and why it did such a good job.

What Makes a Search Engine a Good Search Engine?

How would you make a search engine? Think about it for a second. Really complicated, right? Let’s look at the basic parts of a SE:

1. The Index: All search engines have to have an index–that is, a stored collection of every page on the internet, ever. Sounds a bit ridiculous, but it’s true. According to a March 2012 survey by Netcraft, there are about 644 million websites active right now. Imagine storing all that data. It would take quite a few thumb drives! Here are some pictures of Google’s data centers so you can see what it actually takes.

Now, Google doesn’t actually store every website. It has crawlers (web programs, essentially) that go out and find websites, but they can only find so many. In theory, to do this, Google has some “seed sites” such as cnn.com, nytimes.com, etc, where they start the crawlers, and those crawlers go through the links on those websites (and those websites, and those websites, and those websites, etc) until they run out of bandwidth. The sites they reach are added to the index. As you can expect, there are TONS of websites that are not in the index. They are not findable via Google, and are part of what is known as the “dark web”.

Interesting side note–It is actually possible to manually prohibit Google from crawling your site, and some of the dark web is dark for that reason (others simply haven’t been found). Supposedly, you can buy a kidney or two through these dark web pages, but you’d never find them unless someone told you the URL and you went there directly. Pretty scary!

2. The Interface: This is the most simple part of a search engine, but is just as important as the rest. It is the actual search engine website–how it looks and how it functions. There have been many variations (think the simple white of Google.com or the beautiful pictures at Bing.com), and they continue to change. A large part of the interface is how the engine returns it’s results (known as the Search Engine Result Page or SERP for short), but we’ll get into that later.

3. The Algorithm: This is the final and most complicated part, and by far the most valuable. In short, it answers the basic question of: How do you decide what pages are the “best” results for a query?

The first search engines based the answer largely on the physical text on the page (so if you searched for “car accident lawyer”, it would return the page that said “car accident lawyer” on it the most), but that led to the first instances of “gaming” the search engines. Websites would “stuff” their pages with keywords so that they would come up first. Obviously, this caused problems, as the pages that were best for search engines (and were coming up first on the SERPs) were gibberish to users.

Google to the Rescue!

Enter Larry Page and Sergey Brin. The year is 1995 and the place is Stanford. Larry is considering going there, and Sergey is told to show him around. They apparently bond pretty quickly. A year later, they start working on a search engine, except they decide to use more than the text on the page to determine the rankings. Instead, they use an old research paper methodology: citations. If a website is “cited” a lot (as in, linked to), it is apparently popular and should rank higher. To this day, the number of backlinks (or, the number of websites linking to your website) is a huge factor in the rankings of a website.

Note that the first two factors are based on links, and make up almost half the circle. (Source)

Note that the first two factors are based on links, and they make up almost half the circle. This graph is the result of a survey of experts in the search industry, and is by no means exactly what Google uses–if we knew that, our job would be much easier! (Source)

Larry and Sergey name their project “BackRub”, based on the backlinks idea, and get to work. By mid-1998, they receive funding and set up shop in a friend’s garage. Luckily, they change the name before getting too far along. I don’t think saying “I’m gonna go ‘BackRub’ the answer” has the same ring to it as “Google” does–but that could just be me.

The rest is history, as they say. By the end of ’98, PC Magazine publishes a report saying that Google “has an uncanny knack for returning extremely relevant results” and nominates them as the best search engine in the Top 100 Web Sites in 1998. It rapidly grows, becoming the giant it is today. Two college kids built an index of the entire internet, and their company is bringing us amazing things like robotic cars and “augmented reality” glasses.

I’ll stop there with the history (if you want the full history, visit this amazing website I just found) Hopefully that gives you a good idea of what a search engine is and where they came from.

Part 2 of 3 of the Series on Search: The Present

Part 1 of this series talked about the history of search engines. It’s really interesting to me, but I work with them. For this part, let’s talk about something more applicable to the average internet user: how they work and what that means for you.

Google’s Loyal Customers

You’re probably loyal to a certain search engine. Chances are, it’s Google (you have heard of Bing, right?), because they have about 69% of the market right now. In some cases, you may be so loyal that you’ll type “Google” right into Bing or vice versa, despite having a perfectly good search engine right in front of you (as of December 2011, there were 4 million people searching “Bing” in Google and 117 million searching “Google” in Bing every month).

“Searching” is, for most people, as natural as getting into your car and turning on your music–you don’t think about it. Google, not surprisingly, is trying to make this even easier: did you know that if you’re using Chrome (and you should be), you can type any Google search directly into the URL bar. If you’ve been typing in “Google.com” and searching there, or even using a special Google toolbar on the right, you’re wasting precious milliseconds of your life!

I’m in Internet Explorer in the picture, but I had to to install the Google Toolbar.

The point here is that most people don’t know when a search engine is a good idea and when it isn’t. If you’re looking to buy something, have you ever gone to Amazon and searched there instead of Google? Chances are, your results will be better–you can choose to rank them based on price, relevance, user reviews, etc, as opposed to letting Google decide what is best. After all, Google and Bing are businesses. Their primary goal is to make money. This sometimes means they will provide you with the best results, but often means they will provide you with the highest chance of making them money.

Google, Incorporated

When you search and click on a normal (organic) result, Google doesn’t make money. They make money when you click on their ads. To get you to click on the ads, they try to make the organic results good so that you’ll keep coming back. Eventually, you’ll click on the ads, they think. That is their motivation to make Google a good search engine–so that you’ll make a habit of it and eventually they get paid.

Did you not read what I wrote, Google?

I’m sure you’re familiar with the ads I’m talking about. They can be shopping results like the above example (which vendors now have to pay Google to show–they used to be free), or the standard orange-boxed/promoted results above, to the right, and at the bottom of the organic listings.

This is Google saying “pay me!”

What you might not know is that this is a massive ad network known as Google Adwords. Google made about $36.5 billion from these ads last year alone, which is about 96% of their total revenue. Remember those two kids from Stanford who thought backlinks were a good way to build a search engine? That simple idea had enormous effect: the majority of the world started using their engine, which allowed them to build a massive search-based advertising system, and they are now using their profits to run international projects on a system that is basically on auto pilot. Talk about a solid business model.

How does Adwords actually work? It’s called a CPC model, or Cost-Per-Click. It’s just as it sounds–every time someone clicks on an Adwords ad, the advertiser pays Google money. How much can a measly click be worth, you ask? Check out this article by SpyFu that came out last month. The highest priced ads cost the companies $200-$300 every time you click on them. Most clicks are more in the $0.50-$1.00/click range, but you can see how these could add up.

$300 a click! (Source)

Dealing with Google, Inc

Okay, you know how they make money, now what does that mean for you? First of all, it means that the search “experience” is going to be good. If search results suck and advertisers are misleading, you’ll stop using Google, so this is the upside.

Pro #1 of Google Inc: The search engine does a good job. They will keep it that way or you won’t come back.

To keep the search results appropriate, Google is constantly tweaking their ranking algorithm to get rid of spammy or misleading sites. Most tweaks are unnoticeable (they make changes every day, you just can’t tell), but there are the occasional big updates. Here is a list of every change ever, going all the way back to December 2000. The last two, Panda and Penguin, from February 2011 and April 2012 respectively, were possibly the biggest so far. While past updates have been about Google’s speed or brands in the SERPs, Panda/Penguin were specifically about removing spam.

They look innocent, but you don’t want to mess with them. (Source)

Overall, the impact has been positive. A ton of spammy sites were penalized. Unfortunately, there were good sites caught in the cross fire. There were stories of Mom and Pop shops making 90% of their revenue from search visitors that completely dropped out of Google and had to shut down. That’s the end of the “pros” of Google, Inc (there was just one)–let’s get into the downsides.

Con #1 of Google Inc: Every website’s success or failure in search is at the whim of Google’s search team.

If an update improves Google’s revenue but your site tanked, Google has no mercy. They are aware of this and play the “greater good” card, saying you gotta break some eggs to make a search engine omelette. This can be heartbreaking (or devastating) if you’re the egg. To make matters worse, Google is weighting big brands’ websites more and more these days, making it even harder to get off the ground if you’re starting a small business.

Con #2 of Google Inc: The first results in Google are often the biggest brands, or at least the ones with the most money.

On the organic side, brands are weighted. If you search for “buy shoes online”, you see the big guys: Zappos, Overstock, DSW–basically the department stores of the internet. Google has deemed these the “best” results for shoe shopping, but what if you want a unique brand that no one has heard of (you frickin’ hipster)? Good luck finding them in Google.

No, I did not make this, it already existed somehow. (Source)

On the paid side, the highest results (the ones at the top of the page vs the bottom) are basically given to the highest bidder. There is some weight given to relevance and quality, but the bid is a large factor. This is good ol’ capitalism at play (which I’m sure you have your own opinion about), which means the little guy has little hope of showing up.

Now this can be good. After all, Overstock/Zappos are great sites with huge selections. Sites that are just starting out (mine included) might not have the best design or selection. Overall though, it seems a bit unfair–the big get bigger and the little guys struggle to keep up. I suppose that could be motivating or discouraging, depending on how you look at it, so I’ll let you make the ultimate decision.

Con #3 of Google Inc: Organic search is becoming a completely different landscape than paid.

It used to be that a lot of e-commerce sites would show up in the organic results. Some would use Adwords as well, but it was not required at all to make money.That is no longer the case. A side effect of Panda/Penguin is that organic results are heavily weighted toward websites that have fresh, quality content. That means blogs, review sites, wikis–not shopping websites that only have lists of products. As a result, e-commerce sites have had to move to paid where the traffic is no longer free (Google does not mind this).

What this means is this. If you’re doing research, use the organic results. If you’re looking to buy something, use Adwords or go straight to the company website. Google will only continue to define each part separately, so you might want to get in the habit now.

That’s all for today!

I think that’s a good stopping point. If you keep these ideas in mind when you search, I think you’ll not only get better results, you’ll understand them a bit better. Google is not stopping any time soon, so it’s a good idea to learn how it works before it controls every single aspect of your life!

Part 3 of 3 of the Series on Search: The Future

It’s been tough condensing an entire industry into a couple posts, but I’m mostly satisfied with Part 1 and Part 2. Let’s move on to Part 3, The Future Of Search.

A quick recap: Google rose to become the leader in search through their backlink-focused algorithms. They crawl and index a huge portion of the web, and pull from this index when we search. The three parts of the search engine (the crawl/index, interface, and algorithms) are crucial for any engine and, at least for now, most people think Google handles them best.

Looking forward, it’s fairly obvious that these three parts will always be necessary. What will change is what each will look like and how we use them. Let’s start with the crawl/index.

The Semantic Web

The semantic web is the future of search, on the back end. Until recently, search engines could only read and store the words on different parts of webpages. This type of index serves most purposes just fine, until you have an ambiguous query–say “bacon actor”. The engine might not know if you mean Kevin or just a guy acting like the food (because I’m sure that happens?).

That’s where the semantic web can help. Semantics is “the study of meaning…[focusing] on the relationship between signifiers“. Basically, it would involve identifying a collection of “things” (like movies, products, people), and assigning them characteristics and relationships. The “person” Kevin Bacon would have a profession of actor (a characteristic) and would have been a part of the movie Hollow Man (a relationship), and search engines would “understand” this.

A crude depiction, but you get the idea!

With this system in place, when you include the characteristic “actor” in the “bacon actor” search, it’s going to assume you mean Kevin (or David?), and can also give you all kinds of information about your choice.

As you may have realized, this is already starting to happen. Go ahead and search “bacon actor” and see what the SERP looks like (Search Engine Result Page). While you’re at it, try “Raleigh weather” or “define philanthropy” or “10 dollars in pounds”. The break-out results at the top or right of the normal SERP is Google’s progress in Semantic Search, and they’re adding to it all the time.

There was a particularly famous example of this kind of search, made public on Reddit about a year ago. It involved searching for movies only by long descriptions of their plots, and Google did an outstanding job (compared to Bing/Yahoo) finding them. Discussion here if you’re interested. Go ahead and try one yourself, you might be amazed!

Notice the “Call” and “Directions” buttons on mobile vs the more research-oriented desktop SERP.

So, a semantic-style index will be much easier for us searchers, but it will not come easily. Websites would have to start marking up their sites with “meta-data” that identifies the “things”, and that can be a lot of work. It is happening, but slowly. For example, have you ever searched and seen someone’s picture next to the article they wrote (try “windows 8 review”)? That requires some back-end coding. Luckily, people are starting to embrace it. Once the semantic infrastructure is complete, we will be one step closer to having search engines understand our queries, as opposed to just searching for words.

If you want a little more info on the Semantic Web, here’s a 3 minute video from Jimmy Wales, or if you’re more into TED-style talks, here’s a 16 minute explanation from Tim Berners-Lee.

The Interface

The interface seems like a minor part of the search engine, but it has large implications. It’s no longer white background verses colorful background–it’s where and how those backgrounds are shown. The obvious example here is mobile. I can’t find the source right now, but I read last week that a representative from Google said that 1 in 3 searches now include a place. “Raleigh weather” or “temperature NYC” are examples, but these examples are best shown on a regular computer, since they’re more research-based.

What about when you’re downtown at a friends house and you want some pizza? You pick up your phone and type in “pizza downtown raleigh”, but should the results you get be the same as at a computer? Google thinks not. They show more action-based results: phone numbers of restaurants, maps with directions, reviews by your friends, etc.

Notice the “Call” and “Directions” buttons on mobile vs the more research-oriented desktop SERP.

The future is even more action-oriented. Siri gave us the ability to simply speak to our phone and get results, but we still have to use the phone itself to follow directions or read the resulting information.

Google’s external research department Google X is working on the next step: Google Glass. Project Glass is an “augmented reality head-mounted display”. Think virtual reality, but instead of a video game or X-men training room, it’s Google at your finger tips (eye tips?). The idea is to give directions, show calendar reminders, or even identify restaurants/shops right in front of your eyes without having to touch anything.

Here’s an awesome video about what it would look like:

Freakin’ sweet, right? The rumor is that they want to move from a glasses setup to a contact lens, so that no one could even tell you were connected. I see some new rules for bar trivia in the future…

Apparently, the ultimate goal is to embed sensors in your brain so that it integrates with how you think. Sounds a bit scary (and it definitely is an ethical grey area), but it could have huge positive impacts on how the human race works and lives.

The Algorithms

This is going to be the toughest part. It’s by far the most complicated, and it’s what makes Google Google. This is what determines that Gizmodo outranks TechCrunch for the query “xbox 720”. A lot of it comes from the text on the page, the links to the page/domain, social interactions, etc, but the actual ranking methods are far from understood.

I’m quite sure that the algorithms are so complex, even those who work on them at Google don’t understand them completely. It almost seems like the Manhattan Project–the engineers might understand part of the project, but no one knows how the entire thing works.

Larry Page and Sergey Brin (Google founders), instead of Einstein and Szilard (first conceived of nuclear chain reaction). Maybe a bit obscure? (Source and Source)

I’m actually going to save this part for another time. It’s basically the future of Google, and I’m trying to keep these short (ish). I’d like to say that the future is perfect SERPs, where exactly what you were looking for ranks highest, but I work in the industry–it’s our job to “unnaturally” change the SERPs so that our clients show up in them. This has gotten more difficult (or at least more complicated) over the years, but I don’t think it’s ever going to be impossible. The question is, how will Google keep giving us good results while only making it possible for deserving websites to show up? Hm…something to think about.

So, thing-based searching on your eyeball that returns perfect results? Sounds unprecedented, revolutionary. And I don’t think it’s far off. Stay tuned.

That concludes the 3 part series on search engines! Hopefully you enjoyed them and learned something cool. Look for quicker, less esoteric posts down the road! As always, thanks for reading.

If you’re still looking for more info, check out these other great resources on the subject!

Alex Miller’s “Student’s Guide To Search Engines”: http://alexmiller.com/the-students-guide-to-search-engines/