The January issue of the Journal of the American Society for Information Science and Technology, 61(1):1–11, 2010, has an article by Roitblat, Kershaw, and Oot describing a study that compared computer classification of eDiscovery documents with manual review. It found that computer classification was at least as consistent as human review was at distinguishing responsive from nonresponsive documents. If having attorneys review documents is a reasonable approach to identifying responsive documents, then any system that does as well as human review should also be considered a reasonable approach.
The study compared an original categorization, done by contract attorneys in response to a Department of Justice Second Request with one done by two new human teams and two computer systems. The two re-review teams were employees of a service provider specializing in conducting legal reviews of this sort. Each team consisted of 5 reviewers who were experienced in the subject matter of this collection. The two teams independently reviewed a random sample of 5,000 documents. The two computer systems were provided by experienced eDiscovery service providers, one in California, and one in Texas. The authors of the study had no financial relationship with either service provider or with the company providing the re-review. The companies donated their time and facilities to the study.
The documents used in the study were collected in response to a "Second Request" concerning Verizon's acquisition of MCI. The documents were collected from 83 employees in 10 US states. Together they consisted of 1.3 terabytes of electronic files in the form of 2,319,346 documents. The collection consisted of about 1.5 million email messages, 300,000 loose files, and 600,000 scanned documents. After eliminating duplicates, 1,600,047 items were submitted for review. The attorneys spent about four months, seven days a week, and 16 hours per day on the review at a total cost of $13,598,872.61 or about $8.50 per document. After review, a total of 176,440 items were produced to the Justice Department.
Accuracy was measured as agreement with the decisions made by the original review team. The level of agreement between the two human review teams was also measured.
The two re-review teams identified a greater proportion of the documents as responsive than did the original review. Overall, their decisions agree with the original review on 75.6% and 72.0% of the documents. The two teams agreed with one another on about 70% of the documents.
About half of the documents that were identified as responsive by the original review were identified as responsive by either of the re-review teams. Conversely, about a quarter of the documents identified as nonresponsive by the original review were identified as responsive by the new teams.
Although the original review and the re-reviews were conducted by comparable people with comparable skills, their level of agreement was only moderate. We do not know whether this was due to variability in the original review, or was due to some other factor, but these results are comparable to those seen in other situations where people make independent judgments about the categorization of documents (for example, in the TREC studies). A senior attorney reclassified the documents on which the two teams disagreed. After this reclassification, the level of agreement between this adjudicated set and the original review rose to 80%.
The two computer systems identified fewer documents as responsive than did the human review teams, but still a bit more than were identified by the original review. One system agreed with the original classification on 83.2% of the documents and the other on 83.6%. Like the human review teams, about half of the documents identified as responsive by the original review were similarly classified by the computer systems.
As legal professionals search for ways to reduce the costs of eDiscovery, this study suggests that it may be reasonable to employ computer-based categorization. The two computer systems agreed with the original review at least as often as a human team did.
The computer systems did not create their decisions out of thin air. One of the systems based its classifications in part on the adjudicated results of the two review teams and the senior attorney. The other system based its process on an analysis of the Justice Department Request, the training documents given to the reviewers (both the original review and the two review teams), and on a proprietary ontology. These two systems, in other words, implemented a set of human judgments. These systems succeed to the extent that they can capture and reliably implement these judgments. The computers and their software do not get tired, cannot be not distracted, and are able to work 24 hours a day. These results imply that using a computer-based classification system is a viable way to produce reasonable eDiscovery document categorization.
Please contact me (firstname.lastname@example.org) or (email@example.com) if you would like a copy of the full paper.