RUFE, District Judge.
Plaintiffs in this multi-district litigation (MDL 2342) allege that the antidepressant Zoloft, when taken during pregnancy, caused birth defects in the children born to exposed mothers. The Plaintiffs' Steering Committee ("PSC") in MDL 2342 proposes to offer the testimony of various expert witnesses on the issue of general causation. These expert witnesses include Anick Bérard, a perinatal pharmacoepidemiologist, who holds a Ph.D. in Epidemiology and Biostatistics from McGill University, and who teaches at the Université de Montréal. Dr. Bérard has conducted research on the effect of drugs, including antidepressants, on human fetal development, and opines that Zoloft, when used at therapeutic dose levels during human pregnancy, is capable of causing a range of birth defects (i.e., is a teratogen).
Before the Court is the Motion to Exclude the Testimony of Dr. Bérard, filed by Defendants Pfizer Inc. and Greenstone LLC ("Defendants" or "Pfizer"). Pfizer does not challenge Dr. Bérard's academic qualifications, but argues that unreliable methods and principles were used to reach her conclusion that Zoloft may cause birth defects in the children of exposed mothers. The Court has reviewed Dr. Bérard's report, as well as Defendants' rebuttal expert reports and the briefs of the parties, and held a Daubert hearing at which testimony and evidence were presented in support of each position.
Federal Rule of Evidence 702 reads:
The Third Circuit has distilled this rule to two essential inquiries: 1) is the proffered expert qualified to express an expert opinion; and 2) is the expert opinion reliable?
Under the Third Circuit's framework, the focus of the Court's inquiry must be on the experts' methods, not their conclusions. Therefore, the fact that Plaintiffs' experts and Defendants' experts reach different conclusions does not factor into the Court's assessment of the reliability of their methods.
Here, the scientific question that Dr. Bérard has been asked to address is whether she believes that Zoloft may cause birth defects in children born to exposed mothers, to a reasonable degree of scientific certainty. To meet the Daubert standard, she must demonstrate that she has good grounds for her causation opinion (i.e., the opinion is based on methods and procedures of science, not subjective belief) and a reasonable degree of scientific certainty regarding her causation opinion.
Expert evidence must be relevant and reliable to be admissible. The Court must consider: 1) whether the expert's theory can be tested; 2) whether studies have been subject to peer review and publication; 3) the potential for error in a technique used; and 4) the degree to which a technique or theory (but not necessarily a conclusion) is generally accepted in the scientific community.
Zoloft is a prescription antidepressant, commonly used to treat depression, anxiety, and other mental health conditions. The active ingredient in Zoloft is sertraline. Zoloft is one of a class of drugs known as selective serotonin reuptake inhibitors (SSRIs). Serotonin is a neurotransmitter produced endogenously by humans and other animals. The SSRIs do not contain serotonin; rather, they alter the availability in the nervous system of the serotonin produced by the body. The
The parties agree that birth defects, including every type of birth defect alleged in this litigation, have occurred throughout history. For example, major congenital heart defects, which are among the most prevalent birth defects, occur in as many as 1% of live births. Expanding the scope to include all cardiac defects, one finds an incidence of approximately 7.5% of live births. Although some birth defects are caused by known genetic sources or environmental agents (such as certain viruses, radiation exposure, or teratogenic medications), most are due to currently unknown causes. Teratology is the scientific field which deals with the cause and prevention of birth defects.
Where plaintiffs allege that a medication, such as Zoloft, is a teratogen, it is common to put forth experts whose opinions are based upon epidemiological evidence. Although the "gold standard" for epidemiological studies is the double-blind, randomized control trial, such studies may not ethically be conducted on pregnant women. Therefore, in this context, epidemiologists must rely upon observational evidence.
Epidemiological studies examining the effects of medication taken during pregnancy on birth defects calculate a relative risk (RR) or odds ratio (OR).
Researchers often statistically control for certain suspected and measurable confounding factors (e.g., factors such as maternal
Because an RR or OR calculation is only an estimate, the precision of which may be affected by general or study-specific factors (including confounders and biases, sample sizes, study methods, etc.), researchers also use statistical formulas to calculate a 95% confidence interval, which is an estimated range of plausible ratio values. A 95% confidence interval means that there is a 95% chance that the "true" ratio value falls within the confidence interval range. Some confidence intervals are narrow, indicating that the calculated rate ratio is fairly precise, and some are wide, indicating that it is not and that additional research is warranted. If the lower bound of the confidence interval is greater than one, researchers say that the ratio is "statistically significant" (i.e., there is only a 5% chance that the increased risk reflected in the ratio is the result of chance alone), and will report finding a statistically significant correlation or association between the medication exposure and the birth defect at issue.
Even where the confidence interval is narrow and the increased risk is statistically significant, teratologists will not draw firm conclusions from a single study, as apparent associations may reflect flaws in methodology, including multiple comparisons, bias, or confounding, or may be incongruous with existing scientific knowledge about biological mechanisms. When specific potential confounders or biases are identified, researchers will attempt to design studies in such a way that they can determine the degree to which those factors contributed to an outcome. In general, before concluding that there is a "true" association between a medication and an adverse outcome, the teratology community requires repeated, consistent, statistically significant human epidemiological findings, and studies which address suspected confounders and biases.
Epidemiological studies alone can only inform scientists that two events (e.g., medication use and a birth defect) are associated. For this litigation, the experts have been asked to opine as to whether Zoloft causes the birth defects at issue, which requires analysis beyond the identification of statistical correlations reported in published epidemiological studies. To infer a causal relationship from an association, scientists look at well-established factors sometimes referred to as the Bradford-Hill criteria. These include: the strength of the association between the exposure and the outcome; the temporal
Dr. Bérard has conducted epidemiological studies and published peer-reviewed papers examining the effect of maternal use of antidepressants during pregnancy. Although her opinions on the issues relevant to this litigation have evolved over time, Dr. Bérard's current opinions on the teratogenic effects of Zoloft are summarized in two paragraphs of the expert report she prepared for this litigation:
The Court must examine the reliability of the methods Dr. Bérard used to arrive at these opinions.
As discussed above, in the field of epidemiology, the generally accepted method for determining whether a substance is a potential teratogen is to look for statistically significant associations between medication exposure and a pattern of birth defects, which are consistent and replicated across epidemiological studies, and to then apply the Bradford-Hill criteria. Dr. Bérard derives her conclusions about causation, in large part, by charting published findings from various studies (sometimes inaccurately) on a "forest plot" (a graphical depiction of the odds ratios and confidence intervals from multiple studies), and drawing conclusions from trends in odds ratios depicted on the forest plot without regard to whether the underlying published findings were statistically significant, and without further statistical analysis. Dr. Bérard testified that, in her view, statistical significance is certainly important within a study, but when drawing conclusions from multiple studies, it is acceptable scientific practice to look at trends across studies, even when the findings are not statistically significant.
Epidemiology is not a novel form of scientific expertise.
This is not a case where an expert is simply moving into novel terrain, wherein no methodology has yet been well established. There exists a well-established methodology used by scientists in her field of epidemiology, and Dr. Bérard herself has utilized it in her published, peer-reviewed work. The "evolution" in thinking about the importance of statistical significance Dr. Bérard refers to does not appear to have been adopted by other epidemiologists, even the very researchers she cites in her report. Her departure from that methodology in her litigation report and testimony requires more thorough justification than she has presented to the Court. Although she cited the Rothman textbook as support for her methods, she has advanced no evidence indicating that this is a "methodology [that] has been exposed to critical scientific scrutiny,"
The Court is particularly concerned about the risk of reaching an erroneous conclusion using Dr. Bérard's methodology. Dr. Bérard opines that, although one cannot assume teratogenicity from one weak association in one study, one can assume teratogenicity based upon multiple weak associations found across many studies. However, an equally plausible conclusion from multiple studies finding only weak associations, not greater than one would expect by chance, is that the true association is weak; so weak that one cannot conclude that the risk is greater than that seen in the general population. This is, in fact, the conclusion most researchers in Dr. Bérard's field have reached regarding the association between Zoloft and birth defects, even those cited by Dr. Bérard in support of her contrary opinion.
The Court is mindful of its function as a gatekeeper; it is not for the courts to be the pioneers, forging new trails in scientific thinking, especially when that means departing from well-established research principles, such as the principle of statistical significance. The Court understands that it is difficult to measure small increases in risk when the risk is for a rare event. However, Dr. Bérard testified that the precision of an estimate decreases when an event is rare (as in the case of most birth defects);
In Wade-Greaux, a well-respected pediatric, developmental, and genetic pathologist, Dr. Gilbert, "testified that she does not believe that repeated, consistent epidemiological studies showing a statistically
Because birth defects are rare, Dr. Bérard testified that the scarcity of replicated, statistically significant findings may be the result of insufficient power in the studies, even in large, population based studies, including thousands of exposed women. Therefore, Dr. Bérard testified, her approach was to look for "trends" in ratio estimates reported in (selected) studies, and draw conclusions about teratology from those trends, from rather than statistically significant results. She demonstrated this method to the Court using a forest plot.
"[T]he party presenting the expert must show that the expert's findings are based on sound science, and this will require some objective, independent validation of the expert's methodology."
Dr. Bérard's opinion relies, in large part, on her presumption that SSRIs, although distinct from one another in chemical structure and pharmacokinetic properties,
Examination of the very peer-reviewed, published epidemiology literature cited by Dr. Bérard in her report yields little evidence of a class effect. If there were a class effect, one would expect to find consistent associations between each drug in the class and a given outcome. Instead, there are only scattered statistically significant associations, both within and between studies. For example, in the Louik 2007 study, researchers found a statistically significant association between Zoloft (sertraline) and septal defects, and paroxetine and right ventricular outflow path obstruction, but no association between other commonly prescribed SSRIs (fluoxetine and citalopram) and either of those birth defects. Similarly, that study found only one of the studied drugs was significantly associated with neural tube defects (paroxetine), limb reduction defects (sertraline), and omphacele (sertraline). The Malm 2011 study found a statistically significant association between fluoxetine and cardiovascular anomalies, and citalopram and neural tube defects. Other studies which studied multiple birth defects and multiple SSRIs also found scattered, statistically significant associations between certain birth defects and one (or occasionally two) of the four or more SSRIs studied.
Dr. Bérard opines that the absence of evidence supporting her opinion that every SSRI is associated with a similar increase in birth defects can be explained by study samples which are too small to measure the "true" association between individual SSRIs and birth defects. But this assertion that class effects would be detectable given significantly large samples of pregnant women is itself only a hypothesis, and
At first glance one such study, the Jimenez-Solem 2012 paper, appears to support a limited class effect for cardiovascular defects.
The Myles meta-analysis was designed with the goal of directly comparing the teratogenic potential of individual SSRIs, to determine whether any are comparatively safer. That study noted that "the teratogenic potential of specific agents might differ from the aggregate result for SSRI ... medications as a class,"
The Court also notes that the FDA does not treat SSRIs as a class with regard to warnings about use during pregnancy. Paroxetine is in category D, while all other SSRIs are in category C.
In her own published, peer-reviewed research, Dr. Bérard's has opined that paroxetine may have uniquely teratogenic properties among the SSRIs.
Accordingly, under the principles set forth in Daubert, Dr. Bérard's opinion that SSRIs as a class of drugs cause an increased risk of adverse pregnancy outcomes must be excluded pursuant to Rule 702.
Pfizer argues that Dr. Bérard reaches her conclusions using flawed methods, including the "cherry-picking" of studies and of findings within studies which support her position.
The Court finds that the expert report prepared by Dr. Bérard does selectively discuss studies most supportive of her conclusions, as Dr. Bérard admitted in her
At the hearing, Dr. Bérard provided her rationale for excluding certain studies from her report. For example, she testified that she excluded studies, including some of her own, which used an active comparison group (i.e. a group of women taking another type of antidepressant which is not a suspected teratogen).
Dr. Bérard also pointed out methodological flaws or weaknesses in other studies. The Court notes that all epidemiological studies will have some weaknesses; studies of potential teratogens are particular prone to biases and confounders, because it is unethical to utilize a randomized, double-blind study to examine possible teratogenic effects of a drug. It is not entirely clear that the weaknesses in the studies Dr. Bérard excludes are greater than those in the studies upon which she relies. Moreover, rather than simply ignoring certain studies, the accepted scientific practice is for an expert to explain why she gives more weight to certain studies in forming her opinion, discussing methodology, power, and other key factors.
"Cherry-picking" is always a concern, but is of heightened concern in this case, where many of Dr. Bérard's conclusions and opinions were formulated by identifying trends in odds ratio estimates selected from the published literature, rather than being based upon replicated, statistically significant odds ratio estimates. The fact that her conclusions are drawn from
Moreover, the studies Dr. Bérard does rely upon in her report do not adequately support her opinions, especially in light of her change in opinion from 2012 to the present. Only one of the studies she relies upon was published in 2012, and none were published later. Two of the studies she relies upon were published in 2007, when Dr. Bérard was firmly of the opinion (expressed both professionally and as an expert witness for plaintiffs in the Paxil litigation) that Paxil was uniquely teratogenic and other SSRIs should be used as first-line medications for treating depression in pregnancy.
When drawing causal inferences from associations between exposure to a drug and an adverse outcome, scientists consider certain well-established causation factors, the Bradford-Hill criteria. These criteria include the strength of the association between the exposure and the outcome; the temporal relationship between the exposure and the outcome; the dose-response relationship; replication of findings; the biological plausibility of or mechanism for such an association; alternative explanations for the association; the specificity of the association; and the consistency with other scientific knowledge. An expert need not consider or satisfy all criteria in order to support a causal inference.
Pfizer argues that the Bradford-Hill criteria should only be applied after an association is well established, and that there is no well-established association between Zoloft exposure during pregnancy and birth defects. However, because the Bradford-Hill criteria include as factors the strength of the association between exposure and outcome, and replication of findings, the Court will not adopt Pfizer's view but rather will examine the evidence put forth by Dr. Bérard with regard to each Bradford-Hill criterion.
As discussed above, despite her assertions
Additionally, sound scientific methodology requires that a scientist consider all of the scientific evidence when making causation determinations. The evidence of an association between Zoloft and birth defects, when one looks at the entirety of the literature, and not just the studies cited in Dr. Bérard's report, is even weaker. As Pfizer notes, "Dr. Bérard is only able to present an illusion of `consistency' because she selectively cites only findings that (she says) support her opinions."
The next consideration is the temporal relationship between the exposure and the outcome. Scientists agree that organogenesis occurs in the first trimester of pregnancy, and therefore most of the studies reviewed by Dr. Bérard focus on first trimester exposure to SSRIs. However, as discussed above, one study found a similarly increased risk of birth defects even when women stopped taking sertraline months before becoming pregnant.
With regard to dosage effects, one study conducted by Dr. Bérard found that paroxetine was associated with an increased risk of major cardiac and major congenital malformations only when pregnant women were exposed to more than 25 mg/day during the first trimester; exposure was not associated with birth defects when lower doses of paroxetine were used.
In drawing conclusions about causation, researchers must also consider the biological plausibility of the association. Dr. Bérard testified that she has reviewed the in vitro and in vivo animal studies on the impact of serotonin on fetal development, and concludes that they support her opinions about causation. The Court will not discuss the substance of those studies in detail here, as Plaintiffs' biological mechanism experts' methods and conclusions will be examined in a separate opinion. However, the Court notes that the biological mechanism research does not, at this time, establish: 1) that each of the three developmental pathways hypothesized to be impacted by serotonin exist in humans; 2) the ideal range of serotonin in the developing organism (of any species); or 3) the range of serotonin present in the developing embryo when a pregnant woman is exposed (or unexposed) to Zoloft in pregnancy. In addition to the many unanswered questions about the proposed mechanism, in vitro and in vivo animal studies are "unreliable predictors of causation in humans," in the absence of consistent data from human epidemiologic studies.
Dr. Bérard must also consider alternative explanations for the associations seen in the studies she relies upon, especially in light of the lack of consistency and replication discussed above. Some of these associations may be statistical artifacts of multiple comparisons.
Dr. Bérard opines that SSRIs, in general, and Zoloft, in particular, cause a wide range of birth defects when used during pregnancy. Other researchers in her field have concluded that the epidemiological research on which Dr. Bérard relies provides no conclusive evidence of an association between Zoloft and birth defects. Many go further and advocate use of Zoloft as a first-line drug for treating depression in pregnancy. This does not represent a mere professional difference of opinion; Dr. Bérard's opinions regarding Zoloft are only made possible by her departure from use of well-established epidemiological methods. Dr. Bérard's methodology involved a rejection of the importance of replicated statistically significant epidemiological findings demonstrating an association between Zoloft and a pattern of birth defects, substituting a novel technique of drawing conclusions by examining "trends" (often statistically non-significant) across selected studies. Her methods are not scientifically sound. Additionally, in her report, Dr. Bérard failed to acknowledge and distinguish or otherwise address the research findings contrary to her litigation opinion, including her own peer-reviewed, published research. In summary, Dr. Bérard takes a position in this litigation which is contrary to the opinion she has expressed to her peers in the past, relies upon research which her peers do not recognize as supportive of her litigation opinion, and uses principles and methods which are not recognized by the relevant scientific community and are not subject to scientific verification. Because the methodology and reasoning underlying Dr. Bérard's opinion is not scientifically valid, the Court holds that Dr. Bérard's opinion is not grounded in the methods and principles of science, and therefore it does not satisfy Rule 702, and must be excluded.
It is so
As Dr. Bérard pointed out in her testimony, Myles did exclude some studies from his meta-analysis. However, there is no evidence that he selected studies for inclusion and exclusion based upon the extent to which they supported his a priori hypothesis. In contrast, Dr. Bérard admitted that her expert report and forest plot focused upon published studies which were most supportive of her opinion.
The Court further notes that Myles concluded that Zoloft was not significantly associated with major or cardiac malformations. While Dr. Bérard's conclusions are not at issue here, the contrary conclusions Myles reached using a well-established method raises the possibility that Dr. Bérard's decision to rely on an alternative, non-statistical method of assessing data from multiple studies may be driven by her desire to confirm her a priori hypothesis that Zoloft is a teratogen, rather than by her desire to test the possibility that individual studies were underpowered to detect true associations.