Combating Big Data with Cost-Effective TAR
Yup, this is another article about utilizing TAR/CAR/predictive coding/analytics. It seems like the names for it are growing faster than the number of people utilizing it (I’ll opt for the term TAR). Color me guilty as charged. I’ve been in this industry since it first became an industry. Sure, I’ve read white papers, attended seminars, WebExes, webinars and presentations. I’ve seen demos and discussed TAR with vendors. I’ve recommended it to attorneys in the firms where I’ve worked. But until now, I’ve been among the ranks of litigation support/eDiscovery professionals who had yet to actually use it.
When predictive coding first became a buzzword, one particular vendor tried to tell others not to even use the term (thus, TAR/CAR/analytics, etc. were spawned as comparable terms). From my perspective, the main efficacy and selling point were based on how many documents one didn’t have to review (i.e., not lay eyes on). The pitch went like this: after our process/workflow concludes and our algorithm does its handiwork, there will be X amount/percentage of documents that won’t need to be reviewed, thereby lowering review time, effort and cost.
Possibly because attorneys aren’t statisticians or because they have the mindset of looking for the smoking gun, sometimes as a needle in a haystack, that sales pitch encountered heavy pushback from litigators. The TAR industry then, in this man’s opinion, changed its approach somewhat. The overall pitch is still about saving review costs, but it has become more of a resource management approach. That is, rather than pushing a concept where it’s about all the documents that don’t need to have eyes laid on them, it’s now more about having the more knowledgeable (and higher billing) attorneys review the documents that are scored/ranked higher (and thus more likely to be responsive) and the less knowledgeable (and lower billing) attorneys, possibly even contract attorneys, review the documents that are scored/ranked lower (and thus less likely to be responsive).
Since, in almost every corpus of data, there are far more nonresponsive documents than responsive, this management of human resources results in a cost savings to the client because the lower billing professionals are doing most of the heavy lifting.
Despite the slight shift in the sales pitch regarding TAR, the process remains the same. Sure, there are different flavors and approaches, but the overall process involves knowledgeable people teaching the system and then the system applying that information to the rest of the documents.
The way the seed/sample/test/alpha set is gathered differs from product to product as well as the size of that set. But however the set is gathered, it needs to be reviewed by the folks most knowledgeable about the case. This is so the decisions on that set are accurate and consistent. It is important that the initial set be tagged/coded accurately and consistently because those decisions will be used to teach the system and will be applied against the other documents in the population.
Once the decisions have been made on the initial set of documents and those decisions have been applied to the rest of the documents, others can then start reviewing the scored or ranked documents. Though the specific approach of different products vary, the basic thinking is that the system will continue to learn about what is important (and, just as importantly, what is not important) to the case through subsequent decisions and retraining/teaching of the system.
Various metrics and diagrams are used to track the learning/statistics throughout the process. At some point, a decision is made that reviewing more documents won’t be productive. At that point, samples of the unreviewed documents are collected and reviewed to (hopefully) support that decision. If the sampling proves that there are few, if any, responsive documents in the unreviewed population, you are done reviewing. The next step is to convince the other side of your rationale. Sharing your sampling with them helps in this process.
How much or little to include opposing counsel in your process is up to you. I err on the side of cooperation and disclosure. However, I would caution you not to confuse cooperation and disclosure with letting the other side feel it can dictate the process. Much case law has shown that courts are leery about mandating one process or another. Courts tend to hold off on ruling until a production can be proven to be faulty and thus lacking.
So now that I’m an actual user of this technology, I can see how it can help deal with the big data that we work with all too often in our cases these days. One benefit that I’ve experienced (and so will you, perhaps without having meant to) is that the technology has been around long enough and been vetted enough so your first foray into its use won’t be challenged nearly as much as if you had been on the leading edge of the curve.
So do like I did: take the plunge.