Da Silva Moore has been generating a lot of attention in eDiscovery circles, first for Judge Peck's decision supporting the use of predictive coding, and then for the challenges to that ruling presented by the Plaintiffs. The eDiscovery issues in this case are undoubtedly important to the legal community so it is critical that we get them right.
The Plaintiffs play loose with the facts in the matter, they fail to recognize that they have already been given the very things that they ask for, and they employ a rash of ad hominem attacks on the judge, the Defendant, and the Defendant's predictive coding vendor, Recommind. Worse still, what they ask for would actually, in my opinion, disadvantage them.
If we boil down this dispute to its essence, the main disagreement seems to be about whether to measure the success of the process using a sample of 2,399 putatively non-responsive documents or a sample of 16,555. The rest is a combination of legal argumentation, which I will let the lawyers dispute, some dubious logical and factual arguments, and personal attacks on the Judge, attorneys, and vendor.
The current disagreement embodied in the challenge to Judge Peck's decision is not about the use of predictive coding per se. The parties agreed to use predictive coding, even if the Plaintiffs now want to claim that that agreement was conditional on having adequate safeguards and measures in place. Judge Peck endorsed the use of predictive coding knowing that the parties had agreed. It was easy to order them to do something that they were already intending to do.
Now, though, the Plaintiffs are complaining that Judge Peck was biased toward predictive coding and that bias somehow interfered with him rendering an honest decision. Although he has clearly spoken out about his interest in predictive coding, I am not aware of any time that Judge Peck endorsed any specific measurement protocol or method. The parties to the case knew about his views on predictive coding, and, for double measure, he reminded them of these views and provided them the opportunity to object. Neither party did. In any case, the point is moot in that the two sides both stated that they were in favor of using predictive coding. It seems disingenuous to then complain about the fact that he spoke supportively of the technology.
The Plaintiff brief attacking Judge Peck for his support of predictive coding reminds me of the scene from the movie Casablanca where Captain Renault says that he is shocked to find out that gambling is going on in Rick's café just as he is presented with his evening's winnings. If the implication is that judges should remain silent about methodological advances, then that would have a chilling effect on the field and on eDiscovery in particular. A frequent complaint that I hear from lawyers is that the judges don't understand technology. Here is a judge who not only understands the technology of modern eDiscovery, but works to educate his fellow judges and the members of the bar about its value. It would be disastrous for legal education if the Plaintiffs were to succeed in sanctioning the Judge for playing this educational role.
The keys to the candy shop
The protocol gives to the Plaintiffs the final say on whether the production meets quality standards (Protocol, p. 18):
If Plaintiffs object to the proposed review based on the random sample quality control results, or any other valid objection, they shall provide MSL with written notice thereof within five days of the receipt of the random sample. The parties shall then meet and confer in good faith to resolve any difficulties, and failing that shall apply to the Court for relief. MSL shall not be required to proceed with the final search and review described in Paragraph 7 above unless and until objections raised by Plaintiffs have been adjudicated by the Court or resolved by written agreement of the Parties.
They, the Plaintiffs, make a lot of other claims in their brief about things not being specified, when in fact, the protocol gives them the power to specify the criteria as they see fit. They get to define what is relevant. They get to determine whether the results are adequate, so it is not clear why they complain that these things are not clearly specified.
Moreover, the Defendant is sharing with them every document used in training and testing the predictive coding system. The Plaintiffs can object at any point in the process and trigger a meet and confer to resolve any possible dispute. It's not clear, therefore, why they would complain that the criteria are not clearly spelled out when they can object for any valid reason. Any further specificity would simply limit their ability to object. If they don't like the calculations or measures used by the Defendant, they have the documents and can do their own analysis.
The Plaintiffs are being given more data than they could reasonably expect from other defendants or when using other technology. I am not convinced that it should be necessary in general to share the predictive coding training documents with opposing counsel. These training documents provide no information about the operation of the predictive coding system. The documents are only useful for assessing the honesty or competence of the party training the predictive coding system, they presume that the predictive coding system will make good use of the information they contain. I will leave to lawyers any further discussion of whether document sharing is required or legally useful.
Misuse of the "facts"
The Plaintiffs complain that the method described in the protocol risks failing to capture a staggering 65% of the relevant documents in this case. They reach this conclusion based on their claim that Recommind’s “recall,” was very low, averaging only 35%. This is apparently a fundamental misreading or misrepresentation of the TREC (Text Retrieval Conference) 2011 preliminary results (attached to Neale's declaration). Although it may be tempting to use the TREC results for this purpose, TREC was never designed to be a commercial "bakeoff" or certification of commercial products. It is designed as a research project and it imposes special limitations on the systems that participate, limitations that might not be applicable in actual use during discovery. Moreover, Recommind scored much better on recall than the Plaintiffs claim, about twice as well.
The Plaintiffs chose to look at the system's recall level at the point where the measure F1 was maximized. F1 is a measure that combines precision and recall with an equal emphasis on both. In this matter, the parties are obviously much more concerned with recall than precision, so the F1 measure is not the best choice for judging performance. If, rather, we look at the actual recall achieved by the system, while accepting a reasonable number of non-responsive documents, then Recommind's performance was considerably higher, reaching an estimated 70% or more on the three tasks (judging from the gain curves in the Preliminary TREC report). To claim that the data support a recall rate of only 35% is misleading at best.
The Plaintiffs complain that there are a number of methodological issues that are not fully spelled out in the protocol. Among these are how certain standard statistical properties will be measured (for example, the confidence interval around recall). Because they are standard statistical properties, they should hardly need to be spelled out again in this context. These are routine functions that any competent statistician should be able to compute.
The biggest issue that is raised, and the only one where the Plaintiffs actually have an alternative proposal, concerns how the results of predictive coding are to be evaluated. Because, according to the protocol, the Plaintiffs have the right to object to the quality of the production, it actually falls on them to determine whether it is adequate or not. The dispute revolves around the collection of a sample of non-responsive documents at the end of predictive coding (post-sample) and here the parties both seem to be somewhat confused.
According to the protocol, the Defendant will collect 2,399 documents designated by the predictive coding to be non-responsive. The plaintiffs want them to collect 16,555 of these documents. They never clearly articulate why they want this number of documents. The putative purpose of this sample is to evaluate the system's precision and recall, but in fact, this sample is useless for computing these measures.
Precision concerns the number of correctly identified responsive documents relative to the number of documents identified by the system as responsive. Precision is a measure of the specificity of the result. Recall concerns the number of correctly identified responsive documents relative to the total number of responsive documents. Recall is a measure of the completeness of the result.
The sample that both sides want to draw contains by design no documents that have been identified by the system as responsive so it cannot be used to calculate either precision or recall. Any argument about the size of this sample is meaningless if the sample cannot provide the information that they are seeking.
A better measure to use in this circumstance is elusion. Rather than calculate the percentage of responsive documents that have been found, elusion calculates the percentage of documents that were erroneously classified as non-responsive. I have published on this topic in the Sedona Conference Journal, 2007. Elusion is the percentage of the rejected documents that are actually responsive. It can be used to create an accept-on-zero quality control test or one can simply measure it. Measuring elusion would require the same size sample as the original 2,399-document pre-sample used to measure prevalence. The methods for computing the accept-on-zero quality control test are described in the Sedona Conference Journal paper. The parties could apply whatever acceptance criterion they want, without having to sample huge numbers of documents to evaluate success.
Another test that could be used is a z-test for proportions. If predictive coding works, then it should decrease the number of responsive documents that are present in the post-sample, relative to the pre-sample. The pre-sample apparently identified 36 responsive documents out of 2,399 in a random sample. A post-sample of 2,399 documents, drawn randomly from the documents identified as non-responsive would have to have 21 or fewer responsive documents for it to be significantly different (by a conservative 2-tailed test) at the 95% confidence level.
The Parties in this case are not arguing about the appropriateness of using predictive coding. They agreed to its use. The Plaintiffs are objecting to some very specific details of how this predictive coding will be conducted. Along the way they raise every possible objection that they can imagine, most of which are beside the point; they misinterpret or misrepresent data; they fail to realize that they have the very information they are seeking; and they seek data that will not do them any good, all while vilifying the judge, the other party, and the party's predictive coding service provider. It is as if given the keys to the candy store, they are throwing a tantrum because they have not been told whether to eat the red whips or the cinnamon sticks. Their slash and burn approach to negotiation is far beyond zealous advocacy and far from consistent with the pattern of cooperation that has been promoted by the Sedona Conference and by a large number of judges, including Judge Peck.
So that there are no surprises about where I am coming from, let me repeat some perhaps pertinent facts. Certain other bloggers have recently insinuated that there might be some problem with the credibility of the paper that Anne Kershaw, Patrick Oot, and I published in the peer-reviewed Journal of the American Society for Information Science and Technology on predictive coding. Judge Peck mentioned this paper in his opinion. The technology investigated in that study was from two companies with which none of the authors had any financial relationship.
I am the CTO and Chief Scientist for OrcaTec. I designed OrcaTec's predictive coding tools, starting in February of 2010, after the paper mentioned earlier had already been published and after it became clear that there was interest in predictive coding for eDiscovery. OrcaTec is a competitor of Recommind, and uses very different technology. Our goal is not to defend Recommind, but to try to bring some common sense to the process of eDiscovery.
Neither I, nor OrcaTec has any financial interest in this case, though I have had conversations in the past with Paul Neale, Ralph Losey, and Judge Peck about predictive coding.
I have also commented on this case in an ESI-Bytes podcast, where we talk more about the statistics of measurement.
Thanks to Rob Robinson for collecting all of the relevant documents in one easy to access location.