Monday, December 12, 2011

Secrets of Search – Part One



http://ow.ly/7Wu0D

An article by Ralph Losey posted on his blog e-Discovery Team.

This article discusses search technology used for eDiscovery purposes, and explores some issues related to this topic.  The article discusses certain "Dirty little secrets" regarding search methods, and provides advice on what role keyword searching should have in the process.  Links to other articles on this topic are provided by the author.

The article states, "First of all, and let me put this in very plain vernacular so that it will sink in,keyword search sucks. It does not work, that is, unless you consider a method that misses 80% of relevant evidence to be a successful method. Keyword search alone only catches 20% of relevant evidence in a large, complex dataset, such as an email collection. Yes, it works on Google, it works on Lexis and Westlaw, but it sucks in the legal world of evidence gathering."

The author further states, "Keyword search still has a place at the table of Twenty-First Century search, but only when used as part of a multimodal search package with other search tools, and only when the multimodal search is used properly with iterative processes, real-time adjustments, testing, sampling, expert input and supervision, and other quality control procedures."  The article goes on to explore the use of more advanced techniques, such as "Predictive Coding", and explores the use of keywords as a part of a broader workflow process.

The article provides further evidence of the inadequacies of keyword searching, "Look at the landmark research on Boolean search by information scientists David Blair and M.E. Maron in 1985. The study involved a 40,000 document case (350,000 pages). The lawyers, who were experts in keyword search, estimated that the Boolean searches they ran uncovered 75% of the relevant documents. In fact, they had only found 20%. Blair, David C., & Maron, M. E., An evaluation of retrieval effectiveness for a full-text document-retrieval system; Communications of the ACM Volume 28, Issue 3 (March 1985).

Delusion is a wonderful thing, is it not? We are confident our search terms uncovered 75% of the relevant evidence. Really? Still, no one likes the fool who points out that the emperor is naked, especially the emperor and his tailors who frequently pay all of the bills. Still, here I must go, where angels fear to tread. I must point out what science says."

This comprehensive article goes on to discuss further studies, and provides links to various article and tests that were conducted regarding the efficiency of recall and the precision of retrieval of relevant documents from a larger document population.  The article states, "The so called gold standard used to judge recall and precision rates in information science studies is human review. This brings up an even more important secret of search, a subtle secret known only to a few. Experiments in TREC conducted well before the legal track even began showed that we humans are very poor at making relevancy determinations in large data sets. This is a very inconvenient truth because it puts all precision and recall measurements in doubt."

P.S.  This article is an excellent resource, and should be required reading for all attorneys that are involved in the attorney review process.  Both in-house counsel, and litigators in law firms need to be concerned about the poor results that current standard reviews seem to produce.

No comments:

Post a Comment