What is the Invisible Web and How We Can Find It


Most of us are very good in surfing the Internet, but the truth which many of us don’t know is, we are just skimming the surface of the web! Yes, there are two layers of web. They are surface web and the invisible web. Most of you are quite familiar with the surface web, where we access the information through search engines like Google, Yahoo, MSN, etc. But there is vast unexplored territory lying underneath the surface web which forms the deep web or invisible web. The Invisible web refers to the websites or web pages that cannot be indexed by the search engines.

What is the Invisible Web?
There are some databases or web pages, that search engines like Google, Yahoo, Bing, etc., cannot access or send their crawlers. These invisible sources or repositories are termed as invisible web or deep web. It is true that the invisible web exists and it has been estimated that it is 500 times larger than the visible web or surface web. The vast majority of the deep web is made of free floating web directory data and government-released data. For example, NASA has a huge data in the deep web, which is gathered from their scientific missions and Library of Congress has more than thousands of terabytes of historical data, which is mostly used by historians for their research purposes. Deep web mostly consists of raw data. Private academic data is one of its major constituents.

Why are they Invisible?
The deep web is invisible to us because they contain dynamic pages within database-driven websites. These deep URLs are identifiable and they are generally long and contain a wide variety of symbols like question marks, percentage signs, equal signs, etc. There are some online catalogs that don’t have hyper links, so they are invisible to the search engines. Some website pages have protected passwords to avoid wide accessibility. The content which we see in web pages are HTML coded. Some websites do not use these html codes and they become part of the deep web. Another reason for inaccessibility of websites could be because, few web pages have scripted content using Flash or JavaScript, which is unreadable by search engine crawlers and this makes those pages invisible. Private websites without hyperlinks are also apparent on the deep web. Some websites are ‘geotagged’ which means that, they can be accessible only within a particular region or country. There are some websites that hide its content behind a secure wall, and allow you to access the content only after you register in their site.

How to Access the Invisible Web?
There is software called Tor, through which the deep web can be explored. You just need to download the Tor browser bundle and start accessing the hidden web through this browser. The hidden wiki which has .onion URL extension can also be accessed using this browser. But what you need to know is, some sites which you are trying to access through this browser may cause safety issues to you and your computer. So I advice you not to explore the deep web often through this software. Some other ways to access the hidden web, are discussed below.

  • To know about the information of a company, you can register in the website manta and access the company data provided by Dun and Bradstreet, popularly known as D&B. Without registering also you can access the company’s database but, registering provides an additional advantage to access more detailed information.
  • ThomasNet search is an online register that contains the information of the manufacturing companies especially in North America. Thomas register is a physical directory and it has been converted to online directory.
  • Google Patent Search is an effective patent search engine which provides relevant information on the recent patent applications and the approved patent details.
  • Bizjournals search contains the archives of American business journals and it will be surely helpful to you, if you are an entrepreneur.
  • Archives is a website with ‘.org URL extension’, which gives data on the history of company sites that existed in the Internet and which no longer exists.
  • Virtual Library was initiated by Tim Berners-Lee, who is the creator of web. This is an old catalog which provides relevant information on various subjects.
  • Find Articles website contains a huge collection of industry articles and general articles too.
  • Google Blog Search and Infomine (founded by University of California at Riverside,) provide valuable articles, news feed, etc.
  • SurfWax is a search engine that would help you to dig detailed information on what you need.
  • TechDeepWeb website offers tools to get the resources that are hidden in the deep web.
  • Academic Index is a meta-search engine that pulls out database that are approved by scholars and librarians.
  • Intute website is United Kingdom based database that provides wide variety of information on various academic subjects.
  • Scirus is a scientific research engine that contains science articles, journals, patents, etc.
  • Some of the specialized databases are WorldWideScience, Library of Congress, ERIC – Digital library of educational research and information, British database of educational and research resources, Authoritative U.S. government science information etc. From these databases, the required information can be derived.
  • Browsing the web through VPN (Virtual Private Network) over Internet, provides private data and other resources which are secure and reliable. This can help to reveal hidden information that are inaccessible in the Internet.

BrightPlanet (Internet Company) has estimated that the rate of growing information in the deep web is 10% faster than the surface web. According to the Internet company, “Information held in the deep web is up to two thousand times better quality than the information easily retrieved by the search engines from the surface web”. However it is advised by many scholars not to access the invisible web often, for the reasons of safety.