Online sources | Diana L. Ascher, PhD, MBA

An “online source” is generally understood to refer to any site, database, or other electronic format from which information can be obtained using a computer network. More specific definitions are not usually attempted. Usually the information in question is accessible via the huge constellation of networks known as the Internet, and more specifically from the subset of those that rely on HTML coding, otherwise known as the World Wide Web. Given the breadth of what can be defined as “information,” nearly anything accessible via a computer network can thus be referred to as an online source. Of course, all online sources are not created equal. The following sections are meant as a general guide to finding and evaluating online sources.

The “Deep Web”

The term “deep Web” was coined by “A company called BrightPlanet” (Cohen, 2006, para. 10). On behalf of BrightPlanet, Michael Bergman published a study in 2001 chronicling his company’s efforts to document what had previously been called the “invisible Web.” The terms refer to the information posted on the Web that is invisible to most conventional search engines. (para. 24) Usually, this information is invisible because it is part of an online database. The contents of most online databases are accessible only in response to a direct query. Search Engine “spiders” or “crawlers” thus cannot enter the database to index the information found there. (para. 3) Bergman’s study found that (as of March 2000), the deep Web contained 400 to 550 times more information than the “visible” or “surface” Web, at about 7500 terabytes to the surface Web’s nineteen terabytes. (para. 6) The most recent update I could find, from 2002, estimated 91,850 terabytes to the surface Web’s 167 (Lyman & Varian, 2003, p.11). Further, Bergman estimates that over 95% of the deep Web is available free of charge (again, as of 2001). (para. 6) That other five percent, however, is extremely important.

Subscription-Only Databases

Some online databases are behind password protection and are available only to those who have a paid subscription to the service. These databases are usually indexes to scholarly journals. Online indexes to journal articles have been available for many years (Singh, 2004, p.56), but subscription databases increasingly include abstracts and even full-text access to the articles. This information can be described as “on the internet” because, theoretically, anyone willing and able to pay can access it via their computer. Practically, however, the price for most subscription databases is sufficiently high that most subscribers are institutions such as universities and corporations. A single institutional membership allows access for all eligible members of the institution via the institutional intranet.

The “Surface” Web

All the Web pages theoretically accessible to a search engine can be referred to as the “surface” Web. These Web pages probably do not require much explanation; they are what pop up when a computer user enters a search term in Google or another search engine, follows a link from another site, or types in the address of a familiar site. These Web sites vary wildly in size and quality. Virtually anyone with Internet access can upload files onto a Web page, from major universities and the federal government to bored thirteen-year-olds. Thus, important information can be found there at such sites as www.internettutorials.net (from SUNY) and www.pubmed.gov (medical information from the government).

Separating the Good from the Bad

The quality of most subscription-only databases is fairly easy to determine; in addition to market forces that select for the best databases, articles indexed there are carefully cited. Some further research into the reputations of the author(s) and journal are all a searcher needs to do. In the case of the deep Web, databases must be evaluated on a case-by-case basis. Usually, though, online databases are maintained by an institution of some kind, which can be held responsible for the contents of the database. In the case of the surface Web (and some deep Web databases), every site and every page must be evaluated individually. Can the author of the page be identified? How current is it? Is the site affiliated with an institution? Is there obvious bias? The answers to these questions and many others besides will determine how useful a given Web site is for a particular project. For a more comprehensive guide, see “Evaluating Internet Resources” at http://library.albany.edu/usered/eval/eresources.html. (Jacobson & Cohen, 1996)

Finally, a few general words must be said about online sources in relation to print sources. Though online sources are growing at a prodigious rate, most undergraduate students (the group most stereotyped for “just Googling” everything) use print sources as often as or more often than they use online resources. (Dilevko & Gottlieb, 2002, pp.381-392) Online sources are important, and increasingly so, but they do not appear in danger of supplanting paper sources of information any time soon.

References

Bergman, M. (2001) The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing, 7(1). Retrieved November 20, 2006 from http://www.press.umich.edu/jep/07- 01/bergman.html

Cohen, L., ed. (2006) The Deep Web. Retrieved November 19, 2006 from “Internet Tutorials” maintained by the University at Albany, SUNY. http://www.internettutorials.net/deepweb.html.

Dilevko, J. and Gottlieb, L. (2002) Print sources in an electronic age: a vital part of the research process for undergraduate students. [electronic version] The Journal of Academic Librarianship, 28(6). 381-392.

Jacobson, T. and Cohen, L. (1996) Evaluating Internet Resources. Retrieved November 19, 2006 from the University of Albany, SUNY library web site: http://library.albany.edu/usered/eval/eresources.html

Lyman, P. and Varian, H.R. (2003) How Much Information. Retrieved November 19, 2006 from http://www.sims.berkeley.edu/how-much-info-2003.

Singh, S.P. (2004) Collection management in the electronic environment. [electronic version] The Bottom Line: Managing Library Finances, 17(2), 55-60.

The “Deep Web”

Subscription-Only Databases

The “Surface” Web

Separating the Good from the Bad

Related Posts

5 Things You Need to Know about Sentiment Analysis and Classification

Luciana Duranti

Bill Gates in Search of Nuclear Nirvana

Bank says Ticketmaster knew of breach months before taking action