De-Mystifying Statistical Sampling

Article Insights

Jeremy Guinta’s articles from Ankura Consulting Group LLC are most popular:

in United States

Ankura Consulting Group LLC are most popular:

within Compliance and Insurance topic(s)

What Litigators Should Know About Statistical Sampling in Labor and Employment Disputes

With statistical sampling, counsel can simplify damage analyses, avoid potential issues with incomplete or missing data, and minimize the risk of error.

Questions Counsel Should Ask When Determining if Sampling Is Appropriate:

Are there significant gaps in the data required to complete an analysis?
Is the class too large to review each member individually?
Is the data unorganized, messy, or too complicated to analyze in a timely fashion?
Would a representative sample allow for more detail and care to be paid to a review or analysis?
How will the sampling process be documented and shared between parties?
Do I have an expert who can conduct the sampling, and is the sampling methodology reproducible?

Sampling in Labor and Employment Litigation

Statistical sampling is a generally accepted methodology used to make inferences about populations. When done correctly, statistical samples can produce valid and reliable results that are used in academic research or by courts and regulatory agencies. In Tyson Foods, Inc. v. Bouaphakeo No. 14-1146, 2016 WL 1092414 (U.S. March 22, 2016), respondents introduced "a representative sample to fill an evidentiary gap created by the employer's failure to keep adequate records." The Supreme Court upheld the use of statistical sampling, noting that in this case, the sample was "reliable in proving or disproving the elements of the relevant cause of action." Consistent with Tyson, in instances where the data is incomplete or an individualized review of all class members is necessary but infeasible, statistical sampling can be used to simplify an otherwise complicated damage analysis.

Simplifying Complicated Data

In labor and employment class actions, a wide variety of data sources are needed to complete a thorough analysis. Typically, sensitive data records from human resources, timekeeping, and payroll sources are produced and transmitted to multiple parties to complete the analysis. Depending on the size of the class, this data can quickly become large, unorganized, and unwieldy to work with. Statistical sampling can be used to select a subset of employees, pay periods, and/or locations to limit the amount of data that needs to be analyzed. However, a valid, statistically representative sample of the population can provide nearly as precise results without needing to analyze the entire population.

Filling in Gaps in Data

In some industries or lines of work, it is more common that employee time records will be incomplete. In Tyson, timekeeping records did not specifically record the amount of time employees used putting on and taking off specific equipment. A statistical sample of employee shifts was used to infer the average amount of unpaid time spent putting on and taking off specific equipment before and after a shift. Similarly, in other instances where all worked time has not been appropriately captured or data was not properly retained, statistical sampling of complete time periods can be used to fill in the gaps.

Limiting Specific Review

First, statistical sampling offers the benefit of reviewing a subset of the larger population but maintaining nearly as accurate results as if the entire population had been analyzed, if done properly. If each individual in the class requires a detailed, individualized review, the analysis may become too time-consuming or costly. Second, designing a representative sample may actually yield more accurate results than individualized review of the entire population because more care and supervision can be applied to a smaller subset of information.¹ Third, statistical sampling can limit the quantity of sensitive data that is transferred between parties in the litigation proceedings. Fourth, in instances when electronic data is not available, for example, the company maintains timekeeping information using paper time cards or log sheets, a random sample of documents can yield accurate results without necessitating an extensive and expensive data-entry process.

Sampling Process

For a sample to provide statistically valid results, certain steps must be taken to ensure the sample is random² and reliable:

Definition of Sampling Frame. Before the sample is drawn, the entire population must be identified. This can be a specific list of employees, shifts, pay periods, or locations that are at issue in the litigation. Similarly, the sampling unit, or level of data that will be randomly selected, must be properly defined.
Ordering of Sampling Units. The data must be sorted and ordered using a specific data field or methodology that can be replicated. For example, you can order the data based on employee name, date, and shift start time, or the original order in which the data was produced.
Random Number Generation. After the data has been properly ordered, a random number is assigned to each sampling unit. Random numbers can be generated using a number of tools. However, it is important to understand conceptually that random numbers are generated by these tools in a way that each number has an equal chance of being generated, so that each sampling unit also has an equal chance of being selected. For litigation, it is equally important that the standard of reproducibility and verifiability can be met by using a seed value for the sample.
Reorder the Sampling Frame. When the random numbers have been assigned to the data, the data is resorted using those random numbers.
Select the Sample. Given a set sample size, sampling units are selected from the top of the data, working downwards. For example, if the sample size was determined to be 75 units, when ordered by the generated random number, the first 75 units listed would be selected as the sample.
Document the Sampling Process. Following the selection of the sample, the details pertaining to the five steps listed above must be properly documented and shared with relevant parties so that the sampling process can be replicated and reviewed.

Takeaways

Statistical sampling, when done properly, can allow for a simplified analysis while producing reliable results. Courts and regulatory agencies alike have acknowledged and allowed for the use of statistical sampling in situations where data may be incomplete or too unorganized to analyze in their entirety. Litigators should consider the option of statistical sampling in their future labor and employment class action cases as the size of data continues to grow and companies continue to use varying technologies to capture relevant employee data.

In our next installment of this three-part series, we will explore the key questions counsel should consider once the decision to sample has been made.

What is Statistical Inference?
What do the margin of error (MOE) and confidence level mean?
What types of sampling methods are there?
How can I be confident in the results of the sample?

Footnotes

1. Sampling Techniques, 3rd Edition by William G. Cochran.

2. A random sample minimizes any systematic bias that could be introduced if the sample was not drawn randomly. (See Dattalo, P. (2010). Strategies to Approximate Random Sampling and Assignment. New York, NY: Oxford University Press, p.20.)

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

De-Mystifying Statistical Sampling

Contributor

What Litigators Should Know About Statistical Sampling in Labor and Employment Disputes

Sampling in Labor and Employment Litigation

Sampling Process

Takeaways

Litigation, Mediation & Arbitration

Contributor

United States