Seven years after it first rose to prominence in eDiscovery, technology-assisted review remains an important, and at times controversial, tool in the eDiscovery practitioner’s toolkit
In “Still Crazy after All These Years,” we discussed the slow but steady growth in the importance of TAR. In “In the Beginning Was da Silva Moore,” we discussed the first case to address TAR. In “Questions of Choice in Kleen Products,” we discussed an attempt to force the use of TAR, and in “Reported Results in Global Aerospace,” we discussed the first instance of reported TAR results. In “A Negotiated Protocol in In Re: Actos,” we discussed a successfully negotiated TAR protocol. In “At a Judge’s Direction in EORHB,” we discussed a Judge ordering TAR use unprompted. In this Part, we review the process and transparency debates in Biomet.
The next prominent TAR case was In Re: Biomet M2a Magnum Hip Implant Products Liability Litigation (N.D. Ind. Aug.21, 2013), which was consolidated multidistrict product liability litigation. The litigation was eventually settled, but not before substantial discovery work was done and two orders concerning the use of TAR were issued.
The initial collection work done by the Defendants yielded about 19.5 million documents. They then used keyword searching to cut that number down to 3.9 million documents, and then deduplication further reduced the population to about 2.5 million documents.
The Defendants also leveraged random sampling to estimate the prevalence of relevant materials at three points in this process. In the initial collection of 19.5 million documents, they estimated (with a 99% confidence level) that only about 1.37% to 2.47% of the documents were actually relevant. After performing the keyword searches, they estimated (also with a 99% confidence level) that just 0.55% to 1.33% of the leftover 15.6 million documents excluded by the searches were still relevant. And, after deduplication, they estimated (with a 95% confidence level) that their final pool of 2.5 million documents contained about 14.41% to 17.91% relevant documents.
Still facing review of 2.5 million documents to locate the estimated 360,000 to 448,000 relevant documents contained within, the Defendants opted to leverage technology-assisted review, and the Plaintiffs objected to this approach.
The Plaintiffs objected to the Defendants’ decision to apply TAR (in this case, “predictive coding”) after filtering with keyword searches, rather than applying it to the full collection of 19.5 million documents. To make this argument, the Plaintiffs pointed to the same studies used in prior cases to support the use of TAR, which show the superiority of TAR to keyword searching.
In an attempt to address their concern, the Defendants offered to run additional keywords chosen by the Plaintiffs or make other process adjustments, but the Plaintiffs were adamant that the Defendants should be compelled to restart their TAR effort using all 19.5 million documents.
Despite the Plaintiffs’ focus on the alleged superiority of implementing TAR without keyword searching first, the Judge framed the issue in a different way:
The issue before me today isn’t whether predictive coding is a better way of doing things than keyword searching prior to predictive coding. I must decide whether [the Defendants’] procedure satisfies [their] discovery obligations and, if so, whether [they] must also do what the [Plaintiffs] seek. [emphasis added]
After reviewing the work performed, the Judge concluded that the Defendants’ process did, in fact, satisfy their discovery obligations.
The Judge reached this conclusion, in part, by comparing the prevalence of relevant documents in the deduplicated search results (14.41% to 17.91%) to the prevalence of relevant documents in the excluded leftovers (0.55% to 1.33%) and pointing to this as evidence of the keyword search effort’s efficacy. It should be noted, however, that this comparison of percentages is somewhat misleading, as the populations to which those percentages apply are very different (2.5 million and 15.6 million, respectively).
The Judge also considered the costs that would have been associated with the Plaintiffs’ requested approach and concluded that a cost of millions of additional dollars would outweigh the potential benefit gained from finding some more of the remaining relevant documents.
A few months after the dispute discussed above, a new TAR-related dispute arose. The Plaintiffs requested access to the seed set documents used to train the predictive coding tool so that they could look for any gaps and suggest new keyword searches accordingly. The Defendants informed the Plaintiff that all of the seed set documents had already been produced but declined to identify which of the produced documents they were, and the Judge concluded that he had no authority to compel the kind of disclosure the Plaintiffs’ sought:
The [Plaintiffs] want to know, not whether a document exists or where it is, but rather how [the Defendants] used certain documents before disclosing them. Rule 26(b)(1) doesn’t make such information disclosable. [emphasis added]
Though he concluded he could not compel the Defendants to make the requested disclosure, the Judge did describe the Defendants’ refusal as “troubling” and “below what the Sedona Conference endorses.” He also warned the Defendants that there could be consequences for uncooperativeness:
An unexplained lack of cooperation in discovery can lead a court to question why the uncooperative party is hiding something, and such questions can affect the exercise of discretion. [emphasis added]
Upcoming in this Series
In the next Part of this series, we will review the Progressive and Bridgestone cases, which both involve attempts to switch to TAR mid-stream.