A search engine is defined as “a program for the retrieval of data, files, or documents from a database or network.” The most common type of search engines was developed to search, what we term, the Internet. Though within academic and other databases, such as Ebsco, there are search engines to search the content within the database holdings. You may be familiar with Westlaw and LexisNexis; they both have internal search engine programs that search the content behind the password wall.
I used a search engine to help me find the above-mentioned definition. I used Google. That is my search engine of choice. However, there are a quite a few other well-known search engines, such as Dogpile, Yahoo!, or Bing. Did you know that there were over 150 different types of search engines available to search for web content? So rather than looking at one particular search engine as a one-stop-shop location for all resources available online, you might consider choosing your search engine, or engines, by what type of resources you are looking to review. There are two ways to evaluate search engines:
(1) by content, meaning what they are searching, and
(2) by how the information is compiled that they are searching, meaning that you are searching within a directory versus an index.
Search engines can be broken down into different categories of expertise for searching. You might consider the following categories in determine what you’re looking for in your search: Blog Search Engines; Book Search Engines; Business Search Engines; Forum Search Engines; Image and Multimedia Search Engines; International Search Engines; Job Search Engines; Legal and Law Enforcement Search Engines; Map Search Engines; Medical Search Engines; News Search Engines; People Search Engines; Price/Shopping Search Engines; and Social Search Engines.[i] Specific search engines may be better suited for one type of search over another. For example, if you were looking for someone’s phone number, you might want to use a people search engine versus a social media search engine, which would give you any Facebook, Twitter, MySpace information over actual phone numbers and addresses.
Another factor in considering which search engine you want to use is to evaluate how that search engine searches for information. You might consider whether the search engine is human-powered directory, searching the invisible or deep web, or is an all-purpose crawler search engine. Here are the differences. Human-powered search engines are also known as web directories, which is an index that is compiled by humans. Humans add links that they determine to be high quality to a directory, and when you run a search using this human-driven search engine, you are searching that directory of links for information relevant to your query. The following list includes human-based search engines:
- Mahalo (Web directory that uses human editors and displays the results beside a Google search)
- Eurekster Swickis (Web directory that you create and have complete control over. Still in Beta form).
- Open Directory ( The “largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors.”[ii] The Open Directory project is also known as DMOZ, or Directory Mozilla.)
- Yahoo!Search Directory (“The Yahoo Directory is a human-created and maintained library of web sites organized into categories and subcategories. Yahoo editors review these sites for potential inclusion in the Directory, and to evaluate the best place to list them.”[iii])
All-purpose search engines that use crawlers, such as Google, search information a bit differently. Rather than having humans add information and links to an index, a company, such as Google, runs a program (also known as a spider) that follows links throughout the Internet. While it is following, or crawling, through the web, the program is grabbing information from websites and adding it to an index. Additionally, “the crawler doesn’t rank the pages, it only goes out and gets copies which it stores, or forwards to the search engine to later index and rank according to various aspects.”[iv]
Figure 1: How Google Search Works[v]
So basically, when you run a search on an all-purpose crawler search engine, you are not actually searching the web, you are searching the index that the particular company’s crawlers have picked up and added to theindex. You are searching that index. The following list of search engines are just a few that use crawlers to add information to their internal index that you can search:
- Google (The ranking #1 all-purpose search engine)
- Yahoo! (A combined search engine and web directory and compiles results from both services)
- MSN Search
- AOL (Easy to use platform that users often start off with early in their web searching careers)
Search engines that search the invisible, or deep, web refers to search engines that search inside of databases and other sites that are generally overlooked by crawlers that create search engine indexes. The term invisible web includes much more information that can be reached through crawler search engines, therefore there are often more powerful search engines to use for mining for that valuable, yet often missed, information. The following list of deep-web search engines may prove useful to you in those hard to reach inquiries for web content:
- The Internet Archive (A database that provides access to multimedia, including music, audio, movies as well as print materials.)
- Scirus (A science search engine that searches over 379 million science-specific web pages, including scientific journals websites, and other science-related resources)
- USA.gov (A government website that includes links to the Library of Congress, the Smithsonian and all the agencies.
- GoogleScholar (Searches university repositories, journal publishers, author webpages, and other databases)
There are numerous ways to search for the web content of the web. Understanding how the search engine works and what the search engine is searching for will help you become a better researcher using the web. Please keep this in mind as you bookmark your favorite search engine, there may be a bigger net to cast in the sea of information available to you, and you don’t want to throw out a smaller one that doesn’t get the job done. Try not to limit yourself, or your results, by using the appropriate search engine for the task.
[i] About.com, The Search Engine List: A Comprehensive List of Search Engines You Can Use, (last accessed Dec. 17, 2012) available at http://websearch.about.com/od/enginesanddirectories/tp/search-engine-list.htm.
[ii] DMOZ Open Directory Project, About the Open Directory Project,(last accessed Dec. 17, 2012), available at http://www.dmoz.org/docs/en/about.html.
[iii] Yahoo!, How does the Yahoo! Directory Differ from Yahoo! Search?, (last accessed Dec. 17, 2012), available at http://help.yahoo.com/l/us/yahoo/directory/basics/basics-03.html;_ylt=Aj.gU1J7MAqRR.OEQquMUWBGkiN4.
[iv] Loren Baker, Anatomy of a Search Engine Crawler, Search Engine Journal, Sept. 21, 2005, available at http://www.searchenginejournal.com/anatomy-of-a-search-engine-crawler/2230/.
[v] Nishant Pinto, How Google Search Works, Digital Lifestyle: All That You Need to Know, posted on Nov. 24, 2012, available at http://nishantpinto.blogspot.com/2012/11/how-google-search-works.html.