A multi-part series providing guidance on how to effectively scope and plan eDiscovery projects
In the first Part of this series, we reviewed the value of preparation, planning, and checklists, as well as the evolving challenges and expectations associated with eDiscovery project planning. In the second Part, we discussed the initial eDiscovery project scoping steps you must take. In the third Part and fourth Part, we discussed some of the investigative steps that can follow, including targeted interviews, reactive data mapping, surveying, and sampling. In this Part, we turn our attention to volume estimation and cost estimation.
Once you have completed your initial planning and completed your investigation activities to validate and flesh out your initial assumptions, you should be equipped with enough information to proceed to estimations of project volumes and potential costs. At a minimum, you need a reasonably accurate count of:
You should also have some sense of how large each category of sources is (i.e., laptop size, mailbox size, etc.) and how broadly you expect to have to collect (i.e., full images vs. logical images vs. pre-filtered collections/exports). With this information, making an educated guess as to your initial collected volume becomes simple math:
Custodians x (Sum of Issued Devices’ Typical Sizes)
+ Mailboxes x Typical Size
+ Network Shares x Typical Size
+ Sum of Enterprise and Departmental Systems’ Sizes
+ Backup Tapes/Storage Media x Tape/Media Sizes
= Approximate Total ESI Volume to Collect
Once you have this number, you will need to make some additional assumptions and adjustments to project your likely downstream volumes.
First, you’ll need to consider the expansion of the collected data volume that will occur at the beginning of processing. For example, your collected data volume will include some number of compressed container files (e.g., ZIP, RAR, etc.) each of which will expand into one or more files of larger size than when compressed. Other types of compressed and nested content also exist (e.g., local PST and OST email stores), and during processing all will be fully expanded so each element can be individually normalized, tracked, and reviewed. The amount of expansion can vary widely – from as little 10% to more than 40% – depending on just what was in the original collection.
Second, you’ll need to consider the immediate reductions that will occur from de-NISTing, deduplication, and the application of any objective filters:
As both parties and collection tools have grown more sophisticated in recent years, the trend has definitely been towards smaller, more-targeted initial collections that therefore reduce less during this phase. Additionally, it should be noted (when estimating volume for hosting costs) that the final, post-processing volume will expand slightly again when loaded into a review platform to accommodate the review platform’s database file, extracted text files, etc.
A variety of tools and analyses are available to help you select your assumptions and do these sorts of estimations. The EDRM group has collected several free calculators here, and their own EDRM Data Calculator is a good place to start. For moving on to cost estimations and other downstream planning, you will need to estimate not only total volumes, but also potential file/document counts. You can review EDRM’s published metrics for this here, and the 2016 results of John Tredennick’s annual study of “How Many Documents in a Gigabyte” are available here.
At this point in your process, you have completed your initial planning, completed your investigation activities to validate and flesh out your initial assumptions, and you’ve completed your project volume estimations. You now have enough information to also do cost estimation:
As with volume estimation, cost estimation is now largely a matter of math, multiplying your projected volumes and counts by your preferred service provider’s price list (or bundle, etc.). Several of the calculators linked above for data volume estimation can be used to help you estimate pricing as well, and many service providers also offer their own calculator built to reflect their specific pricing model.
Estimation of the review costs portion does require some additional work. Simply dividing your projected document count by 50 documents per hour to get a total number of hours of review to be performed will not give you an accurate estimate. Instead, you must consider a number of additional variables:
All of these variables will affect how much must be reviewed, how fast it can be reviewed, and how many labor hours the total effort will take. A deep dive into document review design and management is beyond the scope of this series, but any experienced eDiscovery project manager can help you think through the options and their effects.
As we noted in the first Part, checklists are invaluable tools for ensuring consistency and completeness in your efforts. Here are model checklists for volume and cost estimation, which you can customize for your organization:
Upcoming in this Series
In the next Part of this series, eDiscovery Project Roles and Communication, we will continue our discussion of eDiscovery project planning by considering project roles and communications.
Whether you prefer email, text or carrier pigeons, we’re always available.
Discovery starts with listening.