Tuesday, November 16, 2010

Reading Notes, November 16, 2010

David Hawking , Web Search Engines: Part 1 and Part 2

I was unable to access this article, either through the website or through the ULS Find Articles process.

Shreeves, S. L., Habing, T. O., Hagedorn, K., & Young, J. A. (2005). Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.

In other classes, we've been reading about the difficulties of creating a common infrastructure that makes scholarly research easily accessible. If I'm interpreting this article correctly, then OAI appears to be one solution, at least with regards to open access archives. As the article points out, OAI allows metadata harvesting only, but many participants in the initiative have built search and retrieval services around the metadata. Of course, there are still problems with interoperability of multiple repository resources, so it's not a complete solution. But it appears to provide a basis from which to start the process of e-print archive and other archive access and retrieval.

MICHAEL K. BERGMAN, “The Deep Web: Surfacing Hidden Value”

This article points out that much Web content is out of the reach of search engines. Any search of the Web merely scratches the surface of what is actually available. Over 200,000 deep websites are out there on the net. As the Web continues to grow, more and more content becomes out of reach to major search engines like Google. Search engines function by sending out "spiders" to retrieve website data, which is then indexed and ranked according to popularity. But there is a limit to how much of the Web can be indexed using this method. This reveals one of the failings of seach engines like Google. Many people laud Google, while dismissing OPACs as being sadly outdated, with traditional human-created library catalogs and metadata as being overly time-consuming and a waste of money. But it would seem that having humans do the work in some instances can at least result in a more complete ability to account for and access data. Neither system is perfect, but it's worth pointing out that Google, too, has failings, because most people assume that Google gets everything right.

1 comment:

  1. Kel, I enjoyed reading your notes about the article that discusses the Deep Web. I thought you touched on all of the main points, and especially brought up the ever-important point that Google (just like so many search engines available) is not 100% all the time. There is much information that is available that many many many users will never even begin to expore because of the barriers presented by surface web limitations. Nice reading notes!

    ReplyDelete