Monday, April 13, 2009

TREC Legal Track 2008

I have been going through some of the results from the TREC legal track for 2008. It is a veritable goldmine of interesting information.

TREC (Text Retrieval Conference) is a major annual research effort sponsored by the National Institute of Standards and Technology and other government agencies. The current report covers the 17th annual conference, and the third version of the legal track. The 2008 legal track was coordinated by a group including Doug Oard, Bruce Hedin, Stephen Tomlinson, and Jason Baron. The goal of the legal track is "evaluation of search technology for discovery of electronically stored information in litigation and regulatory settings."

There were three kinds of task evaluated last year:
  • Ad hoc retrieval involving automated search, where each team used its technology to retrieve documents using their own search technology.
  • Relevance feedback, where each team retrieved some documents, got feedback after this pass and then modified their searches to take advantage of this feedback.
  • Interactive, where each team was allowed to interact with a topic authority and revise their queries based on this feedback. Each team was allowed ten hours of access to the authority. In addition, they were allowed to appeal reviewer decisions that the team thought were inconsistent with the instructions from the topic authority.
Each team was free to use whatever technology they chose.

Two other kinds of searches were also employed. One was a Boolean search negotiated between a "plaintiff" and a "defendant." The second was a search that retrieved all of the documents in the collection.

A sample of the documents that were retrieved was then judged by volunteer assessors (reviewers) to determine whether these documents were responsive to the topic. Finally, a random set of documents that were not retrieved by any of the technologies was sampled and assessed in an attempt to find out whether some responsive documents might have been missed by all of the teams.

Some of the more interesting findings in this report concern the levels of agreement seen between assessors. Some of the same topics were used in previous years of the TREC legal track, so it is possible to compare the judgments made during the current year with those made in previous years. For example, the level of agreement between assessors in the 2008 project and those from 2006 and 2007 were reported. Ten documents from each of the repeated topics that were previously judged to be relevant and ten that were previously judged to be non-relevant were assessed by 2008 reviewers. It turns out that "just 58% of previously judged relevant documents were judged relevant again this year." Conversely, "18% of previously judged non-relevant documents were judged relevant this year." Overall, the 2008 assessors agreed with the previous assessors 71.3% of the time. Unfortunately, this is a fairly small sample, but it is consistent with other studies of inter-reviewer agreement. In 2006, the TREC coordinators gave a sample of 25 relevant and 25 nonrelevant documents from each topic to a second assessor and measured the agreement between these two. Here they found about 76% agreement. Other studies outside of TREC have found similar levels of (dis)agreement.

The interactive task also allowed the teams to appeal reviewer decisions, if they thought that the reviewers had made a mistake. Of the 13,339 documents that were assessed for the interactive task, 966 were appealed to the topic authority. This authority played the role, for example, of the senior litigator on the case, with the ultimate authority to overturn the decisions of the volunteer assessors. In about 80% of these appeals, the topic authority agreed with the appeal and recategorized the document. In one case (Topic 103), the appeal allowed the team with already highest recall rate (percentage of retrieved documents that were determined to be relevant) to improve its performance by 47%.

How do we interpret these findings?

These levels of (dis)agreement do not appear to be wildly different from those found in other studies. Inter-assessor consistency presents challenges to any study of information retrieval effectiveness. TREC studies have found repeatedly that this inconsistency does not affect the relative ranking of different approaches, but it could affect how we interpret the absolute levels of performance. TREC may substantially under-estimate how well an application could do in a real world application, such as in discovery of electronically stored information, with consistent measurement.

Like most studies of information retrieval, the TREC legal track takes assessor judgments to be the standard against which to judge the performance of various systems and approaches. The legal track used tens of assessors, primarily second and third year law students. With the volume of documents involved in the TREC legal track, the limited resources, and so on, there may not be a practical alternative to getting these judgments from many different reviewers. The assessors averaged only 21.5 documents per hour, so the average assessor took 23.25 hours to review 500 documents—a substantial commitment of time and effort from a volunteer.
The inconsistency in assessor judgments limits the ability of any system to yield reliable results. The appeal process of the interactive task (topic 103), for example, demonstrates what can be gained by increasing consistency. Practically every system showed an improvement in recall as a result of the appeal process, whether or not they were responsible for submitting the appeal. Improving consistency appears to improve the absolute level of performance, sometimes substantially.

The use of multiple assessors matches well the standard practice in electronic discovery of distributing documents to multiple reviewers. The results described here, and others, suggest that there are likely to be similar levels of inconsistency in these cases. Taking the prior year reviews as the standard against which to measure the 2008 assessors, they found only 58% of the documents deemed to be relevant by the prior review—58% recall. Similarly, the 2006 study, found that the second reviewer recognized as relevant, again, only 58% of the documents deemed relevant by the first reviewer. I do not believe that these results are an artifact of the TREC processes or procedures. Rather, I think that this level of inconsistency is endemic in the process of having multiple reviewers review documents over time.

In the practice of eDiscovery, human review suffers from unknown inconsistencies. There is no reason to think that actual legal review should be any more consistent than that found in the TREC studies. For that reason, standard review practice may be grossly under-delivering responsive documents. At the very least, attorneys should seek to measure the consistency of their reviewers and the effectiveness of their classifications.

The TREC legal track represents a tremendous resource for the legal community and for the information retrieval community as a whole. It is a monumental effort, representing untold hours and uncounted dollars. In future articles I plan to describe other interesting findings to come out of this study.

Thursday, April 9, 2009

Analytics in Electronic Discovery

The goal of eDiscovery analytics is to understand your data, its volume, its content, and its challenges. This information is critical to evaluating the risks inherent in the case, the resources that will be needed to advance it and to winnowing and organizing the data for efficient and effective processing. Said another way, the goal of eDiscovery analytics is to have as much useful knowledge as possible about the documents and other sources of information that are potentially discoverable. A related goal, early in the development of a case, is to have the information needed to prepare for an effective discussion with the other side on discovery plans. It is difficult to formulate an effective eDiscovery strategy without broad and deep knowledge of the data and of the issues in the case. And, there is typically great pressure to obtain this knowledge as quickly and as inexpensively as possible. Analysis is not a substitute for document review, but it can facilitate it and reduce the amount of time, effort, and cost it requires.

eDiscovery analytics is a kind of text analytics directed at the kinds of information that attorneys will find useful for managing the case. Fundamentally, it is intended to say what a document collection is about. Are there obvious smoking guns? What proportion of the documents are likely to be responsive? How can we distinguish between documents that are potentially responsive and those that are not? Are there individuals that we have not yet identified who may be important to the case? Are there topics that we have not yet considered that may be important to the case? What documents are likely to be pertinent to each topic?

There are a wide range of tools that are available for eDiscovery analytics. These include linguistic tools, for example, that identify the nouns, noun phrases, people, places, organizations, and other "entities" in the documents. Conceptual tools identify the concepts that appear in the documents. Clustering tools organize documents into groups based on their similarity. On top of these, there are visualization tools that help to display this information in ways that are easily understood.

Social network analysis is often used with emails to identify who is "talking" to whom. This tool can be effective for identifying custodians who may have important information. The patterns of communication do not always follow the pattern that one would expect from an organizational chart and the people with knowledge may not be the same as the people charged with the responsibility by the management structure.

Analysis, especially during early case assessment, is often conducted in the context of great uncertainty. After all, it is an effort directed at reducing that uncertainty. Data may not be fully collected, and it may not yet be fully decided whether the cost of eDiscovery is justified by the merits of the case and the amount at risk. For this reason, sampling is often used in eDiscovery analysis. If, because of time or cost constraints, you cannot analyze the complete collection, analyze a sample.

The ideal sample is one that is randomly chosen from all of those that could be considered for responsiveness. A random sample is one where every document or each record, etc. has an equal probability of being included in the sample. A random sample is desirable because statisticians have found that this is the best, most reliable, way to get a representative sample of items and the best way to infer the nature of the population (all of the documents and records) from the sample. In practice, however, especially during early case assessment, a truly random sample may not be possible and so any inferences drawn from this almost random sample need to be drawn cautiously, but they are often still valuable and useful. The closer you can come to a truly random sample, the more reliable your analysis will be. The size of the sample does not depend on the size of the population, but the larger the sample, all other things being equal, the better will be the extrapolation from the sample to the population.

Some specific questions that can be addressed through analysis include:

  • What topics are discussed in this ESI collection?
  • Which individuals are likely to have pertinent knowledge?
  • What is the time frame that should be collected?
  • How can we identify those documents that are very likely to be responsive or nonresponsive?
  • What search terms should we use to identify potentially responsive documents? Are they too vague, yielding too many documents to review or too specific, missing responsive documents?
  • What resources will be needed to process and review the documents (e.g., languages, file types, volume)?
  • Is there evidence apparent in the data that would support or compel a rapid settlement of the case?
  • How can we respond with particularity to the demands of the other party?

Analysis is not a single stage in the processing of electronically stored information, rather it is an ongoing process that continuously reduces uncertainty. It is a systematic approach to understanding the information you have in the context of specific issues and matters. As Pasteur said, "chance favors the prepared mind." eDiscovery favors the prepared attorney and analysis is the means by which to be prepared.

OrcaTec LLC has prepared a report on early case assessment analysis techniques that is available for a fee. This report discusses about 35 different reports, analytic techniques, and visualizations and about 48 questions to address during early case assessment. Its target audience is electronic discovery service providers, but it may also be of interest to attorneys. Contact for more information.

Related posts include:

Considering Analytics

Welcome to Information Discovery

"Information Discovery" is an occasional blog about finding and discovering information. A major component of what we do, and of information discovery in general, is semantic search. In the legal space, this is called "concept search." Information discovery also involves other text analytic tools, including near duplicate detection, semantic clustering, language identification, and others.

You can find out more about OrcaTec from or visit our Green Web search engine, Truevert, at