A multi-part series on targeting the right data to reduce your downstream review costs
In the first Part of this series, we reviewed the relationship between data volumes and eDiscovery cost – particularly for review, and we reviewed data targeting steps you can take before litigation arises. In this Part, we continue our discussion of data targeting with a review of targeting options during collection and processing.
The extent to which the right data can be targeted during the collection phase will depend primarily on the specific devices and systems from which you are collecting. For example, some enterprise systems will have useful integrated search tools, and others will offer only basic exports. Different types of mobile devices may also have different options for capture. This is one of the reasons why maintaining a data map (as discussed in Part 1) is valuable, and why you will need to start your project with some similar research if you don’t currently maintain one.
Common choices available to you for employee computers and devices will include: capturing full physical images; capturing logical images at the file system level; and, doing targeted acquisitions of specific directories or file types. Common choices available to you for enterprise systems will include: the application of date limitations; exports by user or mailbox; and, exports based on keyword search responsiveness.
When considering which choices to make, it is important to balance the cost of repeating collection activities later against the cost of sifting through more material during processing and ECA – and to consider whether there will be time to collect again later, if that’s needed. The industry trend is currently towards narrower, more-targeted collection.
In addition to narrowing your scope of collection using these capture choices, you can also target your collection by engaging with the custodians from whom you intend to collect. Learning about how those custodians actually work, what they generate, and where they store it can save you from collecting any more broadly from each of them than necessary.
Once your material has been collected and you are ready to begin processing the data, another range of data targeting options is available to you. In addition to standard removal of known system files (“de-NISTing”) and standard removal of duplicate files and messages (“deduplication”), the following options are typically available to you:
At the start of the processing phase you will have another opportunity to target the right data by custodian, mailbox, or directory, if you did not do so already during collection. Prioritizing key custodians and focusing on user document directories, for example, are both common.
Beyond standard de-NISTing, you also typically have the option to perform additional filtering by file type. This is done with the goal of eliminating additional system files not removed by de-NISTing and/or with the goal of narrowly targeting the specific user-generated file types believed most likely to be relevant.
This may be accomplished either through “stop filters” (also called “exclusion filters”) or “go filters” (also called “inclusion filters”). Stop filters exclude specified file types and include everything else, while go filters do the opposite, including only the specified types and excluding anything else. The difference is what happens to your unknown unknowns.
The application of a stop filter designed to clear out system files missed by de-NISTing is most common.
The processing phase provides another opportunity to apply date range filters to eliminate any collected materials too new or old to be relevant to your matter. Targeting the right materials by date range requires consideration of a few factors:
Finally, you will also typically have the opportunity to apply some form of keyword filtering during the processing phase. In cases where the search terms to be used are fixed through negotiation or court order, this can be an effective step to take. In projects where you are using your best judgment to develop keywords through trial and error, there are advantages to waiting on keyword filtering until early case assessment, when the tools available for your use are more sophisticated and the ease of testing and iteration is greater.
Upcoming in this Series
In the next Part of this series, we will review options for targeting the right data during early case assessment and discuss strategic considerations.
About the Author
Matthew Verga, JD
Director, Education and Content Marketing
Matthew Verga is an electronic discovery expert proficient at leveraging his legal experience as an attorney, his technical knowledge as a practitioner, and his skills as a communicator to make complex eDiscovery topics accessible to diverse audiences. A ten-year industry veteran, Matthew has worked across every phase of the EDRM and at every level from the project trenches to enterprise program design. He leverages this background to produce engaging educational content to empower practitioners at all levels with knowledge they can use to improve their projects, their careers, and their organizations.