Friday, June 17, 2011

Predictive Coding Explained


A great deal of discussion has taken place recently about a new form of document review that is taking the eDiscovery industry by storm: Predictive Coding. The reasons for this surge in interest are several – as discussed below – but the timing is not coincidental, as two major trends are colliding when 1) the economics of traditional, linear review have become unsustainable while 2) the early returns from those employing Predictive Coding are nothing short of phenomenal and have given such early adopters a significant competitive advantage. Given the nascent stage of the Predictive Coding world, we thought the timing was right for a quick primer on what Predictive Coding is, what it isn’t, how it came to be and the problems it seeks to address.

Linear document review – where individual reviewers manually review and “code” documents ordered by date, keyword, custodian or other simple fashion – has been the accepted standard within the legal industry for decades. This was not a big deal when ESI volumes were measured in megabytes or even a few gigabytes; the explosion of data volumes over the past decade, however, has exposed traditional linear review as an exceedingly inefficient, costly and inconsistent approach to document review (which accounts for 60-70% of the costs of eDiscovery). There is simply so much data to be coded that the old model has become too slow and expensive to keep up.
Why does linear review perform so poorly in most cases? For starters, many – and often most – documents in a review are “false positives” (i.e. irrelevant, unresponsive, or both), yet they are still reviewed by an attorney, which racks up huge amounts of unnecessary costs. Second, documents are typically not organized by topic which forces reviewers to jump from topic to topic, slowing down the process and leading to inaccurate results. Third, documents aren’t prioritized in any way (i.e. from most important to least important) so reviewers can miss key documents. And finally, because individual attorneys typically know little about a case’s substance, multiple “passes” must be made over the same documents based on the substance of a particular review (i.e. a first pass for relevance, a second for responsiveness, a third for relationship to a substantive category, etc.). Add it all up and one is left with a woefully outdated and extremely expensive approach that is rapidly falling out of favor with clients and outside counsel.
By contrast, Predictive Coding seeks to automate the majority of the review process. Using a bit of direction from someone knowledgeable about the matter at hand, Predictive Coding uses sophisticated technology to extrapolate this direction across an entire corpus of documents – which can literally “review” and code a few thousand documents or many terabytes of ESI at a fraction of the cost of linear review. The result? A more thorough, more accurate, more defensible and far more cost-effective document review…which allows attorneys to do what they were trained to do, namely use the facts to advocate on behalf of their client. Predictive Coding is so powerful that it actually changes the economics of eDiscovery, allowing law firms to win new business while maintaining or even improving their margins.
Due to the number of vendors, practitioners, outside counsel and clients in the eDiscovery space, there has been a lot of confusion about Predictive Coding. For all that it is and all it can do, here are several of the most common ways in which various commentators have inaccurately characterized Predictive Coding:
  • Can be comprised of culling, threading, categorizing and/or clustering. These techniques can be helpful in organizing documents for review. However, they do not themselves predictively “code” documents, nor do they prioritize documents automatically, nor do they provide quality control after the fact, thus they are not Predictive Coding. Put another way, they address one symptom of linear review (the lack of topical organization of documents) but do not address the fundamental flaws of linear review and still require huge review teams (often contract attorneys).
  • A replacement for attorneys. Simply put, Predictive Coding makes seasoned attorneys more valuable (not less) as it allows them to focus on the most important part of any matter: defending or prosecuting their client’s interests. Predictive Coding also allows attorneys to take on more business by expediting the most tedious element of eDiscovery – document review – which is especially important in the current economic cycle.
  • Subject to defensibility issues. A classic red herring, some linear review vendors and practitioners have reflexively voiced concerns about defensibility, namely that Predictive Coding may carry risk because it is not linear review. In fact, the opposite is true: more and more courts are pushing litigants to pursue alternative approaches to document review (like Predictive Coding) due to the risk and costs associated with legacy eDiscovery methods. When part of a thorough, documented process, Predictive Coding is actually more defensible than linear review.
  • As being solely about technology. The technology aspect of Predictive Coding is not trivial and cannot be discounted; it is not easy to do, which is why linear review has continued to outlive its useful lifespan. But what makes Predictive Coding so defensible and effective are the processes, workflows and documentation of which it is an integral part. Although technology is at its CORE, Predictive Coding includes all of these parts as one integrated whole.
  • Solely for big cases and/or big law firms. One of the most common misperceptions is that Predictive Coding is the province of the rich (i.e. AmLaw 100 firms and Fortune 100 clients). This is simply not true. As many small and mid-sized yet forward-looking law firms like Eimer Stahl have begun to realize, Predictive Coding is useful for anyone dealing with litigation or regulatory or internal investigation.
With today’s huge volumes of ESI pressuring inside and outside counsel alike to embrace new approaches to eDiscovery, it’s no surprise that Predictive Coding has become so popular in such a short period of time. Look for this trend to continue throughout 2010 and into 2011 and beyond.

Written by Chris Carpenter

No comments:

Post a Comment