How Search Engine Works | Internet Fundamentals

August 14, 2018November 22, 2018 Gopal Krishna 20739 Views 0 Comments bot, crawler, crawler based search engines, how search engine works, human powered directories, hybrid search engines, search engine

Search Engine

Internet contains a vast collection of information, which is spread out across 1000’s of remote servers across the world. The problem in locating the correct information on the Internet led to the creation of search technology, known as search engine. A search engine can provide links to relevant information based on your query.

The starting point of all search engines is a spider or crawler, which hunts the pages, based on query entered by user and grabs the contents of each of those pages. Once a page has been crawled, the data contained within the page is processed.

By creating indexes, or large databases of Web sites, search engines can locate relevant Web sites when users enter search words or phrases. When you are looking for something using a search engine, it is a good idea to use words like AND, OR, and NOT to specify your search. Using these boolean operators, you can usually get a list of more relevant sites.

Steps in working of Search Engine

Spiders or bots go out hunting websites on the Internet.
Bots and spiders find new websites and pages by following links added on a website.
Once a new page is found spider or bot reads the content and also checks for images.
Everything is stored in large online library called “Index”
This indexed data is stored in encoded format to save space
A user types a query on search engine search bar
The search engine goes to its index library to fetch the required information
The search engine found millions of matching information, so it uses an algorithm to decide in which order to display the results
Information is ready in less than 1 second and displayed on a SERP (Search Engine Result Page)
Regularly updated websites with unique content are given better positions on SERP
User analyses the search result and reaches a website
The searcher types a query into a search engine.
Search engine software quickly sorts through literally millions of pages in its database to find matches to this query.
The search engine’s results are ranked in order of relevancy.

Search engines have two major functions:

crawling &Indexing
Providing answers

Crawling &Indexing

Imagine the WWW as a network of stops in a big city. Each stop is considered as unique document (usually a web page). The search engines need a way to “crawl” the entire city and find all the stops. Through links, search engines’ automated robots, called “crawlers” or “spiders” can reach the many billions of interconnected documents.

Providing answers

Every search engine uses different complex mathematical formulas to generate search results. The results for a specific query are then displayed on the SERP. Search engine algorithms take the key elements of a web page, including the page title, content and keyword density, and come up with a ranking for where to place the results on the pages. Each search engine’s algorithm is unique, so a top ranking on Yahoo! does not guarantee a prominent ranking on Google, and vice versa.

Types of Search Engines

Search engines are the key to find specific information on the Web. Without sophisticated search engines, it would be impossible to locate anything on the Web. There are basically three types of search engines.

Crawler-based search engines (powered by crawlers)

Human-Powered Directories (powered by human submissions)

Hybrid Search Engines or Mixed Results (hybrid of above two)

Crawler-based search engines

Google is example. These are powered by robots (called crawlers; ants or spiders). Crawler is a program that can download web content and then follow hyperlinks within these web contents to download the linked contents. The linked contents can be on the same site or on a different website.

Crawler-based search engines use automated software agents (called crawlers). Crawlers visit a Web site, read the information on the site, read the site’s Meta tags. The crawler returns all that information back to a central depository, where the data is indexed.

Crawling is the method of following links on the web to different websites. Then contents of these websites are stored in search engine databases. Contents can be web pages, images, documents and other files.

Human-Powered Directories

A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.

Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.

Hybrid Search Engines or Mixed Results (hybrid of above two)

In the web’s early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries.