Rethinking Keyword Culling in Assisted Review Workflows


Rethinking Keyword Culling in Assisted Review Workflows

The use of keyword searching to cull an ESI data set prior to electronic document review has long been part of the “typical” e-discovery workflow. The idea that one can narrow both the set of data that requires review and the data that you can discard without review simply by searching for a list of key terms within the data is an attractive option, particularly if all parties can agree on a list of particular key terms to search for. However, with the advent of technology-assisted review and its increasing acceptance by courts and practitioners alike, perhaps it is time to re-examine the importance of keyword searching in your e-discovery workflow.

Many attorneys and litigation support professionals have become quite adept at crafting complex, “comprehensive” keyword searches that find the potentially relevant portion of their (or their opposition’s) ESI. However, it goes without saying that a keyword search can only be as good and as thorough as the person who crafted it. If any key term is left out – because it was not thought of, was unknown at the time the search was created, or, really, for any reason – your keyword search will not return the desired results and you will miss relevant documents in your data set. Some recent industry studies conclude that keyword searches can miss as much as 80 percent of the relevant ESI in a data set. Your mileage may vary, but there is little doubt that even the best keyword search is likely to miss some amount – possibly a significant amount – of potentially relevant data.

One alternative to using keyword search to select a data set for review is the use of an assisted review tool, such as Relativity Assisted Review. As our universe of ESI and corresponding review sets in litigations and investigations continue to expand, more and more matters lend themselves to the use of assisted review. A predictive coding platform, when applied and managed correctly, can and will do a much better job of finding relevant documents within your data set than a keyword search possibly could. With a properly trained assisted review tool, you can review just a portion of your data set and verify with about a 95 percent confidence rate that you’ve found all of the relevant documents.

Not only does the use of assisted review eliminate the practical need for keyword culling, but the use of keyword culling in an assisted review workflow can be problematic. Essentially, predictive coding itself serves as a culling tool, by scanning data to determine what is likely responsive and what is not, to arrive at a set of data that is at least worthy of review, if not ready to produce. To properly train the assisted review tool, you need to provide the system with good examples of both responsive and non-responsive documents. Culling a large percentage of your non-responsive documents prior to training the system will yield fewer examples of the various types of non-responsive documents that exist in your data collection. This means that your seed documents are less likely to include examples of each type of non-responsive document, and your system will not be properly trained to categorize all of the non-responsive documents in your data set.

Likewise, keyword searching can introduce bias in your data set toward the particular types of responsive documents that most often contain the keywords that you’ve searched for. These documents may or may not be representative of all of the different types of responsive documents that exist in your review database. There may be responsive documents that contain keyword hits, but differ conceptually from the bulk of the documents containing those same keywords. This can lead to your seed sets being comprised of only a portion of the various types of responsive documents that exist in your database. Thus, you are less likely to find examples of all responsive documents in your seed sets when training the system and your predictive coding results will be similarly limited.

Before creating and running a list of keyword searches against a data set, ask yourself if this process involves using the best possible method to yield the result you seek. Is this a case in which the use of technology assisted review makes practical and financial sense? If the answer to that question is “Yes,” it makes sense to put technology to work, searching the data set to find relevant documents based on the words and concepts that actually exist within the data, rather than making an educated guess as to what those words and concepts are and hoping your results are complete.  

Because you need to know

Contact Us