OPINION AND ORDER
SHIRA A. SCHEINDLIN, District Judge.
I. INTRODUCTION
Police officers are permitted to briefly stop any individual, but only upon reasonable suspicion that he is committing a crime.1 The source of that limitation is the Fourth Amendment to the United States Constitution, which guarantees that "the right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated." The Supreme Court has explained that this "inestimable right of personal security belongs as much to the citizen on the streets of our cities as to the homeowner closeted in his study to dispose of his secret affairs."2 The right to physical liberty has long been at the core of our nation's commitment to respecting the autonomy and dignity of each person: "No right is held more sacred, or is more carefully guarded, by the common law, than the right of every individual to the possession and control of his own person, free from all restraint or interference of others, unless by clear and unquestionable authority of law."3 Safeguarding this right is quintessentially the role of the judicial branch.
No less central to the courts' role is ensuring that the administration of law comports with the Fourteenth Amendment, which "undoubtedly intended not only that there should be no arbitrary deprivation of life or liberty, or arbitrary spoliation of property, but that equal protection and security should be given to all under like circumstances in the enjoyment of their personal and civil rights."4
On over 2.8 million occasions between 2004 and 2009, New York City police officers stopped residents and visitors, restraining their freedom, even if only briefly.5 Over fifty percent of those stops were of Black people and thirty percent were of Hispanics, while only ten percent were of Whites. The question presented by this lawsuit is whether the New York City Police Department ("NYPD") has complied with the laws and Constitutions of the United States and the State of New York. Specifically, the four named plaintiffs allege, on behalf of themselves and a putative class, that defendants have engaged in a policy and/or practice of unlawfully stopping and frisking people in violation of their Fourth Amendment right to be free from unlawful searches and seizures and their Fourteenth Amendment right to freedom from discrimination on the basis of race.
To support their claims, plaintiffs have enlisted the support of Jeffrey Fagan, a professor of criminology at Columbia Law School, who has submitted an extensive report analyzing the NYPD's practices.6 The City of New York ("City") and the other defendants object to the introduction of Fagan's opinions, arguing that he lacks the qualifications to make the assessments that he makes, that his methodologies are fatally flawed, and that many of his opinions constitute inadmissible conclusions of law.7
NYPD officers are required to fill out a detailed worksheet describing the events before and during every stop that they perform. All of these records are compiled in a database — a database that now contains a wealth of information about millions of interactions between police officers and civilians. The information is both incredibly rich and inevitably incomplete: rich because the dozens of boxes on the worksheet are designed to solicit the very information — who, when, where, why and how — that courts (and the NYPD itself) use to evaluate whether a stop was lawful; incomplete because a fill-in-the-blank document can never fully capture the nuances of a human interaction, because these worksheets capture only the quick responses of police officers rather than of the civilians who have been stopped, and because police officers do not always fill them out perfectly.
How should a jury evaluate the NYPD's stop-and-frisk policy? What should attorneys and witnesses be permitted to tell the jury about the 2.8 million interactions between officers and the people they have stopped? And what should the Court tell those jurors? Both parties agree that the database contains valuable and relevant information. But they disagree vehemently over how to accurately summarize the information and how to fairly describe it to the jury. Defendants' motion to exclude the opinions of Professor Fagan therefore presents this Court with important questions regarding expert testimony and trial management.
With one important exception, Fagan's report is methodologically sound and, under the Federal Rules of Evidence, admissible. I will permit Fagan's generalizations where they are reasonable interpretations of the data and I will prohibit them where I find that they are inaccurate or have little probative value. For the reasons below, defendants' motion is granted in part and denied in part.
II. THE FAGAN REPORT
A. Professor Fagan's Qualifications
Fagan is the Isidor and Seville Sulzbacher Professor of Law at Columbia Law School; director of the school's Center for Crime, Community, and Law; a Senior Research Scholar at Yale Law School; and a Fellow of the American Society of Criminology.8 He has published dozens of refereed journal articles and chapters on an array of topics in criminology including issues related to juveniles, deterrence, capital punishment, race, and New York City.9 He has been studying and writing about the policies at issue in this case for over a decade.10 Perhaps most prominently, in 1999 Fagan conducted a study for the Civil Rights Bureau of the New York State Office of the Attorney General, statistically analyzing the NYPD's data on approximately 175,000 stops and frisks and "focusing specifically on racial disparities in stop rates and the extent to which stops complied with the Fourth Amendment."11 The results of his analysis were published that year in The New York Police Department's "Stop and Frisk" Practices: A Report to the People of the State of New York from the Office of the Attorney General.12
As defendants point out, however, Fagan is not a lawyer and has never taken courses at a law school.13 His graduate degrees are in industrial and civil engineering, with a focus on policy science and criminal justice.14 Furthermore, Fagan "has never worked in a law enforcement field, has never completed a [stop and frisk] form, never conducted a Stop, Question & Frisk ("SQF") and never observed more than a few SQFs or gone for a ride along with a NYPD officer to even observe a SQF."15
B. Fagan's Data Sources
After conducting a stop, NYPD officers are required to fill out a "Stop, Question and Frisk Report Worksheet," which is a two-sided form commonly known as a UF-250.16 Approximately 2.8 million of these worksheets were filled out between 2004 and 2009 and the NYPD entered the information from each of the worksheets into a database and produced it to plaintiffs and Fagan as electronic files.17 Each UF-250 includes information about the suspect's demographic characteristics (age, gender, race/ethnicity); the date, time, duration, location, and outcome of the stop (e.g., frisk, search, type of weapon seized if any, type of other contraband found if any, summons issued, arrest); the suspected crime for which the person was stopped; and whether and what kind of physical force was used. Because the suspected crimes were recorded "using individualized and often idiosyncratic notation," Fagan coded the notations into a set of 131 specific criminal charges and then distributed each "suspected crime" into one of twenty aggregate crime categories (e.g., violent crime, minor violent crime, fraud, drugs).18
On each UF-250, there are twenty boxes that can be checked by police officers regarding the factors — or as Fagan calls them, the "indicia of suspicion" — that motivated the stop. There are ten indicia on Side 1 of the worksheet ("circumstances of stop" or "stop circumstances") and ten more on Side 2 ("additional factors"). The worksheet also contains nine checkboxes regarding the indicia of suspicion that motivated any frisk that took place and four checkboxes regarding the indicia of suspicion that motivated any search.
Fagan's report relied on detailed demographic information, organized by police precinct and census tract, which he compiled from a variety of resources including the United States Census, the federal government's American Community Survey, and a commercial database called ESRI. Fagan used police precincts as his principal unit of analysis because "precincts are the units where police patrol resources are aggregated, allocated, supervised and monitored" and because "precinct crime rates are the metrics for managing and evaluating police performance."19 The demographic data he collected includes information on race, ethnicity, age, income, unemployment, housing vacancy, residential mobility, and physical disorder.20 The City provided him with data on crime complaints from 2004-2009. This data specifies the location of a complaint and type of alleged crime; Fagan categorized the alleged crimes using the same categories that he used to analyze the UF-250s, which "provided a foundation for benchmarking the types and rates of suspected crimes in the stops with the observed rates of reported specific crimes in each police precinct."21 The City also provided Fagan with "patrol strength data" regarding the allocation of police resources to particular neighborhoods. Finally, Fagan included in his analysis information about the location of public housing (where there is often a large police presence) and population density (which impacts the likelihood of police-civilian interactions).22
C. Fagan's Analysis Regarding Plaintiffs ` 14th Amendment Equal Protection Claims and Defendants `Criticism of That Analysis
In order to test plaintiffs' 14th Amendment claim that defendants' stop-and-frisk practices treat Blacks and Hispanics differently than they treat Whites, Fagan designed and ran regressions that sought to determine the impact of a person's race on outcomes such as being stopped, being frisked, being subjected to force during an arrest, etc.23 Fagan's regressions compared the influence of race on these outcomes with the influence of non-race factors such as residency in a poor or high crime neighborhood. These analyses control for the fact that in New York City, as a general matter, Blacks and Hispanics live in higher crime neighborhoods than do Whites.24
Fagan created a benchmark against which "to determine if police are selectively, on the basis of race or another prohibited factor, singling out persons for stops, questioning, frisk or search."25 Police officers may lawfully stop an individual only when they have reasonable suspicion to believe that the person has committed, is committing, or is about to commit a crime. The rates at which different groups of people engage in behavior that raises such reasonable suspicion is therefore relevant to the determination of whether the police are treating people equally. According to Fagan, "a valid benchmark requires estimates of the supply of individuals of each racial or ethnic group who are engaged in the targeted behaviors and who are available to the police as potential targets for the exercise of their stop authority."26 Fagan used two variables in constructing a benchmark that would fulfill these requirements: the local rate of crime and the racial distribution of the local population.27 This benchmark was designed, in part, "to test the extent to which the racial composition of a precinct, neighborhood, or census tract — separate and apart from its crime rate — predicts the stop-and-frisk rate in that precinct, neighborhood, or census tract."28
Based on his statistical analyses, Fagan reached the following conclusions regarding disparate treatment:
The racial composition of a precinct, neighborhood, and census tract is a statistically significant, strong and robust predictor of NYPD stop-and-frisk patterns even after controlling for the simultaneous influences of crime, social conditions, and allocation of police resources.
NYPD stops-and-frisks are significantly more frequent for Black and Hispanic residents than they are for White residents, even after adjusting for local crime rates, racial composition of the local population, police patrol strength, and other social and economic factors predictive of police enforcement activity. Blacks and Latinos are significantly more likely to be stopped by NYPD officers than are Whites even in areas where there are low crime rates and where residential populations are racially heterogenous or predominately White. Black and Hispanic individuals are treated more harshly during stop-and-frisk encounters with NYPD officers than Whites who are stopped on suspicion of the same or similar crimes.29
Notably, Fagan did not include in his benchmark the rates of criminal activity by race. This decision constitutes the parties' central disagreement regarding Fagan's analysis of disparate treatment. Defendants believe that crime rates by race, as reflected in the complaints of crime victims and in the NYPD's arrest data, is the best benchmark: "In an analysis concerned with whom the police are stopping, a reliable benchmark must take into account who is committing the crime."30 Defendants argue that "Blacks and Hispanics comprise a majority of violent crime suspects in all precincts except one in the City, and in most precincts are the overwhelming majority of suspects."31 Defendants point out that Fagan has used arrest data in at least two previous studies, even though arrest data was less complete at the time of those studies than it is today.32
Fagan explains that he chose not to use data from arrests and suspect identifications here because that data is incomplete; imputing the characteristics of the known data to the missing data, Fagan believes, would raise serious risks of selection bias.33 Because suspect race is only known in fifty to sixty percent of cases, extrapolation of that known racial distribution to the remaining forty or fifty percent of cases may not be appropriate, Fagan argues, particularly if the suspect crimes that animate a large share of stops (such as drug possession) do not correlate well to crime reports that identify the race of a suspect (such as assault). In the years since his earlier reports were written, Fagan explains, "the weight of opinion among researchers who were doing this kind of work" is that his current benchmark is an improvement on his earlier benchmarks.34
D. Fagan's Analysis Regarding Plaintiffs ` Fourth Amendment Reasonable Suspicion Claim and Defendants ` Criticism of That Analysis
In order to assess plaintiffs' claim that defendants have engaged in a practice of stopping and frisking New Yorkers without reasonable suspicion and in violation of the Fourth Amendment, Fagan analyzed the combinations of boxes that officers checked on the UF-250s. He did this in two ways. First, he assumed that the forms had been filled out accurately and completely and sought to determine whether reasonable suspicion existed in any given stop based on the boxes that were checked off on the worksheet. Second, by searching for patterns in the worksheet data from across the City and over the 2004-2009 period, Fagan sought to determine whether the data on the forms is accurate and whether the NYPD's use of the forms is an effective way to ensure that officers are complying with the law.35
1. Analysis and Findings Regarding UF-250s, Assuming Their Veracity and Completeness
Because there are ten "stop circumstances" on Side 1 of the form and ten "additional factors" on Side 2, and because officers are not limited in the number of boxes they can check (although they are required to check at least one Side 1 stop circumstance), there are an enormous number of potential combinations of boxes that can be checked. Fagan created the following system for determining whether or not a stop was lawful: First, he categorized the stop factors on Side 1 as either "justified" or "conditionally justified." Second, he defined a stop itself as "justified," "unjustified," or "indeterminate" based on which boxes had been checked. He did this by analyzing case law, as described in Appendix D of his report. The following is a summary of Fagan's algorithm and categorization scheme:
Category 1: Stops are justified if one or more of the following three "justified" stop circumstances on Side 1 are checked off: (1) "Actions Indicative Of `Casing' Victims Or Location"; (2) "Actions Indicative Of Engaging In Drug Transaction"; (3) "Actions Indicative Of Engaging In Violent Crimes."
Category 2: Stops are justified if at least one of the following six "conditionally justified" stop circumstances on Side 1 are checked off and at least one of the additional circumstances on Side 2 are checked off. The conditionally justified stop circumstances are (1) "Carrying Objects In Plain View Used In Commission Of Crime e.g., Slim Jim/Pry Bar, etc."; (2) "Suspicions Bulge/Object (Describe)"; (3) "Actions Indicative Of Acting As A Lookout"; (4) "Fits Description"; (5) "Furtive Movements"; (6) "Wearing Clothes/Disguises Commonly Used In Commission Of Crime."
Category 3: Stops are unjustified if no stop circumstances on Side 1 are checked off, even if one or more additional circumstances on Side 2 are checked off.
Category 4: Stops are unjustified if only one conditionally justified stop circumstance on Side 1 is checked off and no additional circumstances on Side 2 are checked off.
Category 5: Stops are justified if two or more conditionally justified stop circumstances on Side 1 are checked off.
Category 6: Stops are indeterminate if "Other Reasonable Suspicion Of Criminal Activity (Specify)" is the only stop circumstance checked off on Side 1, regardless of whether one or more additional circumstances on Side 2 are checked off and regardless of what is written in the blank space under the "Other" box.
Based on this classification system, Fagan concluded the following about the stops conducted by the NYPD:
More than 170,000 stops, or 6.41% of all stops (6.71% of non-radio run stops, and 5.26% of radio runs), recorded by NYPD officers between 2004 and 2009 were Unjustified.
For more than 400,000 stops, or approximately 15%, the corresponding UF250 forms do not provide sufficient detail to determine the stops' legality.36
Defendants level many criticisms at Fagan's classification system,37 including the following: First, the legality of a given stop cannot be determined based solely on the information on the UF-250, since the worksheet is simply a summary of the events and cannot substitute for a proper evaluation of the totality of the circumstances. Second, Fagan's descriptions of stops as justified, unjustified, or indeterminate constitute inadmissible legal conclusions. Third, Fagan did not incorporate into his analysis the handwritten notes on the worksheets that are made when the box marked "Other" is checked (Category 6), even when those notes provided an explanation of why reasonable suspicion existed. Fourth, Fagan classified Category 3 stops as unjustified even when multiple Side 2 circumstances were checked and Category 6 stops as indeterminate even when the "Other" box was coupled with multiple Side 2 circumstances; these decisions are not supported by the caselaw, which permits some stops that fall into those categories. Fifth, Fagan classified Category 4 stops as unjustified even though courts have permitted stops on the basis of only one "Conditionally Justified" factor. Sixth, Fagan failed to incorporate the location of a stop in determining whether it took place in a high crime area, relying instead on whether the Side 2 high crime area box had been checked, and he failed to incorporate descriptive information about the person stopped (such as height, weight, etc.) that might explain why an individual fit the description of a perpetrator of a crime.
2. Analysis of the Accuracy and Effectiveness of the UF-250s and the Stop-and-Frisk Policy
Fagan also sought to determine the extent to which the information on the UF-250s was accurate and complete. This analysis was largely independent of the justified/unjustified classification model described above. The most important elements of Fagan's analysis involved the trends in the usage of various stop factors and the rates at which stops yielded arrests, summonses, and seizures of weapons and contraband (what he calls the "hit rate").
For example, Fagan found that police officers check the Side 2 box "Area Has High Incidence of Reported Offense Of Type Under Investigation" in approximately fifty-five percent of all stops, regardless of whether the stop takes place in a precinct or census tract with average, high, or low crime.38 Relatedly, the Side 1 box "Furtive Movements" is checked in over forty-two percent of stops; in 2009 it was checked off in nearly sixty percent of stops.39 However, the arrest rates in stops where the high crime area or furtive movement boxes are checked off is actually below average.40
Fagan has found that over the study period, "the percentage of stops whose suspected crime is uninterpretable has grown dramatically from 1.12% in 2004 to 35.9% in 2009."41 Fagan calculates that "5.37 percent of all stops result in an arrest," that [s]ummonses are issued at a slightly higher rate: "6.26 percent overall," and that "[s]eizures of weapons or contraband are extremely rare. Overall, guns are seized in less than one percent of all stops: 0.15 percent.... Contraband, which may include weapons but also includes drugs or stolen property, is seized in 1.75 percent of all stops."42
Defendants respond to these findings and conclusions with a number of different criticisms. For example, they argue that the reliance on hit rates "ignores deterrence as an outcome of a stop, which is perhaps the most successful outcome" and "conflates the legal standards required for stops [i.e., reasonable suspicion] and arrests [i.e., probable cause]."43 Furthermore, "Fagan has no basis and is unqualified to render an opinion as to what might be the appropriate frequency for officers to conduct stops based in part on observed `furtive movements' or on presence in a `high crime area' or under which circumstances it would be proper for an officer to check off these boxes."44 Finally, Fagan's "groundless, highly speculative exposition insinuates that NYPD officers routinely do not adhere to the requisite legal standard of [reasonable suspicion],"45 supplants the role of the jury by reaching ultimate legal conclusions, and is "tantamount to an impermissible credibility assessment."46
III. LEGAL STANDARDS
A. Expert Evidence in General
The proponent of expert evidence bears the initial burden of establishing admissibility by a "preponderance of proof."47 Rule 702 of the Federal Rules of Evidence states the following requirements for the admission of expert testimony:
If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the witness has applied the principles and methods reliably to the facts of the case.
Under Rule 702 and Daubert, the district court must determine whether the proposed expert testimony "both rests on a reliable foundation and is relevant to the task at hand."48 The district court must act as "`a gatekeeper to exclude invalid and unreliable expert testimony.'"49 However, "the Federal Rules of Evidence favor the admissibility of expert testimony, and [the court's] role as gatekeeper is not intended to serve as a replacement for the adversary system."50 In serving its gatekeeping function, the court's focus must be on the principles and methodologies underlying the expert's conclusions, rather than on the conclusions themselves.51 In assessing an expert's methodology, courts may consider (1) "whether [the method or theory] can be (and has been) tested," (2) "whether [it] has been subjected to peer review and publication," (3) "the known or potential rate of error [associated with the technique] and the existence and maintenance of standards controlling the technique's operation," and (4) whether the method has achieved "general acceptance" with the relevant community.52
The courts' gatekeeping function under Daubert applies not only to "scientific" evidence, but also to proffers of "technical, or other specialized knowledge" under Rule 702.53 The objective of this function is to "make certain that an expert, whether basing testimony upon professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field."54 However, recognizing that "there are many different kinds of experts, and many different kinds of expertise," the Supreme Court has emphasized that the reliability inquiry "is a flexible one."55 Accordingly, the factors "identified in Daubert may or may not be pertinent in assessing reliability, depending on the nature of the issue, the expert's particular expertise, and the subject of his testimony."56 Ultimately, the inquiry "depends upon the particular circumstances of the particular case at issue."57 In sum, the trial court has "the same kind of latitude in deciding how to test an expert's reliability ... as it enjoys when it decides whether or not that expert's relevant testimony is reliable."58
In addition, Rule 403 of the Federal Rules of Evidence states that relevant evidence "may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury." "Expert evidence can be both powerful and quite misleading because of the difficulty in evaluating it. Because of this risk, the judge in weighing possible prejudice against probative force under Rule 403 ... exercises more control over experts than over lay witnesses."59 Generally, "the rejection of expert testimony is the exception rather than the rule."60 "The admission of expert testimony is committed to the broad discretion of the District Court and will not be disturbed on review unless found to be `manifestly erroneous.'"61
B. Expert Evidence Regarding Mixed Questions of Fact and Law
As a general matter, experts may not testify as to conclusions of law.62 Doing so would usurp the role of the court in determining the applicable legal standards.63 Although Federal Rule of Evidence 704 says that "[a]n opinion is not objectionable just because it embraces an ultimate issue,"64 the Second Circuit has held that Rule 704 "was not intended to allow experts to offer opinions embodying legal conclusions."65 However, the Circuit has also explained that "experts may testify on questions of fact as well as mixed questions of fact and law."66 In United States v. Scop, the impermissible testimony "deliberately tracked the language of the relevant regulations and statutes [and] was not couched in even conclusory factual statements" whereas in Fiataruolo v. United States, the permissible legal conclusions were accompanied by detailed factual background and explanation that gave the jury "helpful information beyond a simple statement on how its verdict should read."67 This was true even though the expert shared his legal conclusions regarding the ultimate issue that was presented to the jury. The trial court admonished the jury that the expert's opinions were "not binding" and that warning, in combination with the factual support that the expert provided, made his testimony admissible.68
C. Reasonable Suspicion to Conduct A Stop
"`[T]he police can stop and briefly detain a person for investigative purposes if the officer has a reasonable suspicion supported by articulable facts that criminal activity may be afoot, even if the officer lacks probable cause.'"69 This form of investigative detention has become known as a Terry stop.70 "While `reasonable suspicion' is a less demanding standard than probable cause and requires a showing considerably less than preponderance of the evidence, the Fourth Amendment requires at least a minimal level of objective justification for making the stop."71 "`The officer [making a Terry stop] ... must be able to articulate something more than an inchoate and unparticularized suspicion or hunch.'"72 "Reasonable suspicion is an objective standard; hence, the subjective intentions or motives of the officer making the stop are irrelevant."73
It is sometimes the case that a police officer may observe, "a series of acts, each of them perhaps innocent in itself, but which taken together warrant[] further investigation."74 "An individual's presence in an area of expected criminal activity, standing alone, is not enough to support a reasonable, particularized suspicion that the person is committing a crime."75 However, "the fact that the stop occurred in a `high crime area' [may be] among the relevant contextual considerations in a Terry analysis."76 A court "must look at the totality of the circumstances of each case to see whether the detaining officer has a particularized and objective basis for suspecting legal wrongdoing."77 "[T]he proper inquiry is not whether each fact considered in isolation denotes unlawful behavior, but whether all the facts taken together support a reasonable suspicion of wrongdoing."78
IV. DISCUSSION
A. Fagan's Disparate Treatment Analysis Is Admissible
Defendants make one central critique of Fagan's disparate treatment model: that it uses the wrong benchmark to measure bias. Fagan's benchmark relies on local demographic characteristics and local rates of crime. According to defendants and their expert,
the most logical and reliable method to assess the question of whether police are stopping individuals based on race or on [reasonable articulable suspicion] is to use a benchmark of rates of criminal participation by race.... Fagan's choice of local crime rate as a benchmark to measure possible evidence of bias in NYPD stop-and-frisk activity is a fundamental methodological flaw which robs his analysis of any probative value.79
The Supreme Court has explained that "[n]ormally, failure to include variables will affect the [regression] analysis' probativeness, not its admissibility" but that "[t]here may, of course, be some regressions so incomplete as to be inadmissible as irrelevant."80 The question, then, is whether Fagan's analysis is so incomplete as to be irrelevant or so misleading as to be unhelpful to the jury. It is neither.
Fagan explains that he has used the current benchmark in four published studies, including two that were peer reviewed, and in the study for the Attorney General's office.81 One major reason for his use of this benchmark is that he believes there are no better alternatives: suspect race data, which defendants argue is the appropriate benchmark, is only known for sixty-two percent of crimes from 2009 and 2010 (and for fewer crimes before 2009), and the extrapolation of that data to the thirty-eight percent of unknown suspects "would result in sample selection bias."82 Although he has used suspect data in previous studies, "the weight of opinion among researchers who were doing this kind of work" is that his current benchmark is an improvement on his earlier benchmarks. Furthermore, he used this benchmark "to test the extent to which the racial composition of a precinct, neighborhood, or census tract-separate and apart from its crime rate — predicts the stop-and-frisk rate in that precinct, neighborhood, or census tract."83 Defendants' proposed benchmark would not permit Fagan to conduct such an analysis.
Defendants point to Wards Cove Packing Co. v. Atonio84 to support their argument that because Fagan's analysis ignores data on who is committing crimes, it "fails to capture the information necessary to support a valid causal inference of racial discrimination."85 Indeed, the Supreme Court did hold in Wards Cove that the "proper comparison is between the racial composition of the at-issue jobs and the racial composition of the qualified ... population in the relevant labor market."86 But Wards Cove did not hold that the statistical evidence at issue should not have been admitted; it held only that a prima facie case of discrimination could not be based "solely on respondents' statistics"87 showing that Whites were generally hired for high-skilled jobs and non-Whites were hired for low-skilled jobs. The question here is not whether Fagan's analysis, standing alone, would suffice to establish a claim of disparate treatment; it is simply whether Fagan's analysis will be helpful to the jury in assessing such a claim.88
Furthermore, Fagan has designed his benchmark in order to capture the underlying rate at which New Yorkers of different races and ethnicities engage in behavior that raises reasonable suspicion that crime is afoot-the population equivalent to what in Wards Cove was called "the racial composition of the qualified population." He has simply done so using a method that defendants find inadequate.89
Fagan's conclusions do not misrepresent his methodology. He does not claim that Blacks and Hispanics are stopped more frequently than Whites, even controlling for rates of criminal participation by race. Instead, he concludes that (1) the racial composition of a local area is a significant, strong, and robust predictor of stop-and-frisk patterns even after controlling for crime, social conditions, and police resources; (2) Blacks and Latinos are more likely to be stopped by NYPD officers, even in low-crime and racially heterogeneous neighborhoods and when controlling for neighborhood crime rates and police patrol strength; and (3) Blacks and Hispanics are treated more harshly during stop-and-frisk encounters with NYPD officers than Whites who are stopped on suspicion of the same or similar crimes.90 These are the conclusions of an expert criminologist, based on his methodologically sound analyses. At trial, defendants will be permitted to present evidence and argument that the rates of criminal participation explain Fagan's findings and that the NYPD is not discriminating on the basis of race or ethnicity. When they cross-examine Fagan, defendants will surely challenge his opinions vigorously. But they may not prevent plaintiffs from presenting those opinions in the first place.
B. Fagan's Reasonable Suspicion Analysis Is Largely Admissible
1. As a General Matter, Paperwork Is Admissible and Probative
Defendants begin their critique of Fagan's Fourth Amendment analysis by arguing that the UF-250 database cannot be used to establish the existence of a policy or practice of suspicionless stops:
[A]n analysis of check-boxes on a UF250 form cannot be used to establish that a particular stop was not justified. That determination depends on an analysis of the totality of the circumstances of the stop, a fact-intensive inquiry that amounts to far more than whether a box is checked or not. What cannot be done based on a single form cannot be done in the aggregate, either.91
Defendants are correct that as a general matter, courts do not rely solely on police paperwork to determine whether a stop was lawful. Paperwork offers only a limited summary of the events preceding a stop and only from the perspective of the police officer. Faced with suppression motions or section 1983 claims, judges and juries listen to live testimony from officers, suspects, and witnesses with first-hand knowledge of the stop. But courts also review the paperwork. Sometimes paperwork corroborates the officer's testimony; sometimes it undercuts that testimony. Even the absence of paperwork can be probative and admissible.92 In short, while courts rarely, if ever, rely solely on paperwork, courts almost always consider it.
Plaintiffs allege a practice of unconstitutional policing that spans half a decade and 2.8 million stops. Taking live testimony on each of these stops is impossible; taking live testimony on some small sample of the stops would present more problems than it would solve, because there would be no way to confidently generalize from the sample to the entire population. Neither party disagrees with this reality. But in the face of this challenge, the parties offer radically different solutions: plaintiffs seek to use the database to make general statements about the number of "justified" and "unjustified" stops; defendants seek to exclude the database entirely from the analysis of how often stops are constitutional or unconstitutional.
Defendants are correct that it would be improper to declare certain stops "unjustified" and others "justified" on the basis of paperwork alone without offering any qualifications: a perfectly lawful stop cannot be made unlawful because the arresting officer has done a poor job filling out the post-arrest paperwork; nor can an egregiously unlawful stop be cured by fabrication of the paperwork. Indeed, Fagan has presented evidence — entirely independent of his classification system — that would permit a reasonable juror to conclude that a large number of the UF-250s include incorrect information.
But it would be an injustice to prevent the jury from hearing about the extremely rich and informative material contained in the 2.8 million forms and the 56 million boxes on Sides 1 and 2 of the UF-250s. Thousands of New York City police officers have spent an enormous amount of time documenting, in significant detail, the circumstances that led to the stops at issue in this lawsuit; the NYPD has invested tremendous time, money, and energy in compiling, reviewing, and analyzing that data. Although by no means perfect, this information can surely help the jury to evaluate the parties' claims and defenses.93 The data will not be presented in a vacuum-it will be accompanied by the testimony of numerous witnesses and the presentation of much other documentary evidence.94 Plaintiffs will not be asking the jury to find a pattern of suspicionless stops on the basis of the UF-250 database alone; just as during the adjudication of a single stop, they will present "the paperwork" alongside much other evidence. The purpose of the Federal Rules of Evidence is to help courts "administer every proceeding fairly ... to the end of ascertaining the truth and securing a just determination."95 I have no doubt that those purposes are best served by permitting plaintiffs to present this evidence to the jury. The remaining question, therefore, is how to ensure that the presentation is accurate. The short answer is that I will permit generalizations where they are reasonable interpretations of the data and I will prohibit them where they are inaccurate and thus have little or no probative value. During trial, Fagan (and defendants' witnesses) will be required to acknowledge the limitations and shortcomings of the data.
2. Fagan's Classification System Is Largely Admissible But Must Be Modified Before Being Presented to the Jury
Defendants raise numerous concerns with Fagan's classification system. I address each of them in turn. My conclusions require plaintiffs to make some limited modifications to the way that Fagan's opinions are presented to the jury.
a. Expert Legal Opinions
Defendants believe that the use of Fagan's classification system constitutes an inadmissible legal conclusion.96 They cite to Bilzerian for the proposition that expert testimony "must be carefully circumscribed to assure that the expert does not usurp the role of the trial judge in instructing the jury as to the applicable law or the role of the jury in applying that law to the facts before it."97 Fagan will not be permitted to do either of those things.
First, the Court, and not he, will instruct the jury on the law of reasonable suspicion. Fagan will be permitted to describe his analysis of the 2.8 million UF-250s in light of the legal criteria articulated in this Opinion and Order and in any other pre-trial instructions that I give to the parties.98 Any statements he makes regarding reasonable suspicion will have to "`be phrased in terms of adequately explored legal criteria.'"99 As described below in Part IV.B.2.d, he has misinterpreted the relevant caselaw in one important respect and his findings will need to be revised. In addition, his use of the phrase "Indeterminate" with respect to an entire category of stops will not be permitted. His statistical analysis, as revised, is nonetheless admissible.
Second, Fagan's testimony will not usurp the role of the jury: the ultimate question at issue in this suit is whether defendants have a policy and/or practice of conducting suspicionless stops. Although Fagan's testimony will be helpful to the jury in resolving that question — as it must be to be admissible — Fagan does not seek and will not be allowed to express an opinion on that question.
Defendants cite to Cameron v. City of New York, in which the Second Circuit explained that in a malicious prosecution suit against police officers, it was clear error to allow prosecutors "to testify to the officers' credibility and to the existence of probable cause" and that such testimony "violated bedrock principles of evidence law that prohibit witnesses ... from testifying in the form of legal conclusions."100 In Cameron, the prosecutors testified that the arresting police officers were credible. They also testified that they believed, based on the totality of the circumstances, that probable cause had in fact existed to arrest Cameron. The Second Circuit held that such testimony was highly prejudicial.101 Cameron thus would preclude Fagan from expressing his opinion about whether defendants' stop-and-frisks of David Floyd or Lalit Clarkson were lawful and from opining about the credibility of another witness. But plaintiffs do not seek to solicit such testimony. Instead, they seek to solicit testimony that will help a jury of lay people understand the significance of 2.8 million stops and the 56 million boxes describing the indicia of suspicion that led to those stops.
b. Use of the Terms "Justified" and "Unjustified"
For the reasons discussed in Part IV.B.1 above, Fagan's use of the terms "justified" and "unjustified" may improperly suggest that the (il)legality of a stop can be conclusively determined on the basis of paperwork alone. But this danger can be prevented by a limiting instruction to the jury at trial clarifying that the database is necessarily an incomplete reflection of the totality of the circumstances leading to each stop. Fagan will be permitted to explain that if the forms are assumed to be accurate and complete, a certain percentage contain information sufficient to suggest that the stop was lawful and a certain percentage do not contain sufficient information to make such a generalization. The parties will be permitted to introduce evidence and make arguments about when and whether those assumptions regarding accuracy and completeness are appropriate. The parties will inevitably use shorthand to describe these categories — perhaps using phrases such as "apparently justified based on reasonable suspicion" and "apparently unjustified based on the lack of reasonable suspicion" — and it will be the responsibility of the Court and the skilled litigators involved in this case to ensure that the jury is not being presented with misinformation. But the complexity involved in describing the relationship between the worksheets and the stops that they summarize is not a reason to exclude all generalizations about the information that the worksheets contain.
c. Classification of "Other" Stops
Professor Fagan classified as "Indeterminate" the UF-250s on which "Other Reasonable Suspicion Of Criminal Activity (Specify)" was the only stop circumstance checked off on Side 1, regardless of whether one or more additional circumstances on Side 2 were checked and regardless of what was written in the blank space underneath the "Other" option. More than 400,000 stops, or approximately fifteen percent of all stops, fall into this category.102 According to defendants, in approximately 99.8 percent of the UF-250s on which police officers checked off the "Other" box on Side 1, they also wrote something in the narrative field.103 Fagan chose not to use that narrative information, however, because "what was specified was not something that was usable to us in making a systematic analysis."104 Fagan explained that many of the narratives were either gibberish (such as the letter X or NA) or uninterpretable abbreviations;105 others listed a crime such as "trespass" or an activity such as "hanging out in the hallway" but, according to Fagan, "that didn't help us ascertain what the basis of suspicion was for that stop."106 He explains that trying to classify the narratives "would invite a host of potential biases and errors, and would render any conclusions statistically meaningless."107
At the Court's request, Fagan submitted a random sample of 1,000 handwritten entries corresponding to the "Other" stop circumstance that he had evaluated.108 The first page of the Narrative List, which contains forty-one entries, is attached to this opinion as Appendix 2. Standing alone, perhaps a dozen of those forty-one narratives suggest that there was reasonable suspicion to make a stop — these include narratives such as "inside bak [sic] w/no pass code (set off alarm)," "appeared to be smoking marij," "no headlights," and "person stopped by store manager for suspicion of petit larceny." Many of the other narratives, however, do not explain why the officer had reasonable suspicion to believe that a crime had occurred, was occurring, or was about to occur. These include narratives such as "hanging out in lobby," "TAP building," "waistband," "crim tress," "cell phone," "deft observed in NYCHA building," "proximty [sic] to crime location." Although some of these narratives might help establish reasonable suspicion when combined with other factors, standing alone they do not.
Particularly noteworthy is the narrative "keyless entry," which appears four times in the first forty-one narratives and which defendants say appears, in one form or another, approximately 52,500 times throughout the database.109 According to the City, approximately 50,000 of these narratives were completed by a "housing officer," which I presume means that they are related to patrols in or around New York City Housing Authority buildings.110 Defendants argue that "Fagan did not account for the significance of this ["keyless entry"] narrative on its own, in conjunction with the place of the stop or in combination with any [additional circumstances] on Side 2, all of which may be sufficient to qualify the stop as Justified."111 To support this claim, they point to United States v. Pitre, in which Judge Michael Mukasey held that reasonable suspicion existed based on "defendant's entry into the lobby by catching what otherwise would have been a locked door, and his nervous and confused response when asked whether he lived in the building and where he was going."112 Defendants are mistaken that the narrative keyless entry "on its own" may be sufficient to qualify the stop as justified. As Judge Mukasey explained very clearly, standing on its own a keyless entry is not suspicious behavior:
Pitre claims his keyless entry just behind the unidentified woman was not suspicious behavior because he could easily have been a resident of the building walking just behind another resident, and did not want to let the door close and then stand out in the cold — this was mid-December — fumbling for his keys. True enough, but there was more to the encounter before Pitre was stopped within the meaning of Terry.113
It was only after Pitre was unable to clearly answer the police officers' question "where are you going?" and he repeatedly touched the pocket of his jacket and his right side as if feeling for contraband, that the police had reasonable suspicion to stop him. This all occurred in the lobby of a building that the police officers knew was the site of frequent drug and firearms activity. By no means did a keyless entry alone, or even keyless entry plus high crime area, raise reasonable suspicion.
Also noteworthy is the narrative "Loitering," which appears ten times in the first eighty-five narratives. Some of these narratives describe the loitering as happening "in lobby," "in halls," or "in hallway," but others contain only that single word. Although parts of New York State's prohibition on loitering remain good law,114 and some of the narratives might plausibly refer to those genuine violations, the NYPD's misuse of this statute has a long and ugly history: "[t]he City of New York, operating principally through the [NYPD], has continuously enforced three unconstitutional loitering statutes for decades following judicial invalidation of those laws and despite numerous court orders to the contrary.... The human toll, of course, has been borne by the tens of thousands of individuals who have, at once, had their constitutional rights violated and been swept into the penal system."115 Although "loitering" may at times be an officer's shorthand way of describing criminal trespass, its use is often more probative of an unlawful stop than a lawful one. Furthermore, merely naming a penal code violation does not constitute reasonable suspicion.
In short, the narratives accompanying the "Other" stop circumstance are extremely difficult to summarize and Professor Fagan is correct that they cannot be uniformly placed into either his "justified" or "unjustified" categories. However, at least to the extent that other groups of checked boxes are probative of a stop's (il)legality, it is misleading to say, as he does, that for all 400,000 of these "Other" stops, "the corresponding UF250 forms do not provide sufficient detail to determine the stops' legality"116 and that these stops are therefore "Indeterminate." That is to say, many of these forms do provide as much or more detail than the ones that Fagan classifies as "justified." If the jury assumes that it was filled out accurately, a form that contains the narrative "smoking cigarette strong smell of marijuana"117 would be strong evidence of reasonable suspicion. In contrast, if the jury assumes that it was filled out completely, a UF-250 containing no circumstances beyond the "Other" narrative "licking rolling paper" would be strong evidence that no reasonable suspicion existed.118
The UF-250s containing only "Other" on Side 1 are thus not properly described as "Indeterminate." It is most accurate to say that one cannot fairly generalize about them. In many individual instances, when reviewing a particular UF-250, one can make certain determinations — or at least make determinations with the same or more confidence than one could as to other UF-250s. But one cannot make such determinations in a systematic or general way.
This distinction matters because plaintiffs seek to use the fifteen percent of forms that Fagan calls "Indeterminate" as evidence for their claim that the City is liable for a failure to monitor and supervise. Plaintiffs claim that "[t]he NYPD's reliance on information provided by officers on UF-250 forms to assess whether stops are based on reasonable articulable suspicion is an ineffective way to regulate the constitutionality of officer stop-and-frisk practices."119
Fagan may not opine that all 400,000 of the UF-250s on which the only box checked on Side 1 is "Other" are "Indeterminate." Instead, he may testify that his classification system does not permit him to draw general conclusions about this group of UF-250s. Similarly, defendants cannot make wholesale generalizations about these forms. However, the parties will be permitted to introduce a number of "Other" UF-250s and make arguments to the jury about what conclusions it should or should not draw from those forms; determining the form and scope of that evidence and argument will be a matter of trial management.
d. Forms Containing Multiple Side 2 Circumstances
Defendants' fourth criticism of Fagan's reasonable suspicion analysis addresses his classification of some of the UF-250 forms in which two or more Side 2 circumstances are checked off. Fagan labeled Category 3 stops (those with no Side 1 circumstances checked off) as unjustified even when two or more Side 2 circumstances were checked off. He also labeled Category 6 stops (those with only "Other" checked off on Side 1) as indeterminate even when two or more Side 2 circumstances were checked off. Defendants argue that this was improper because "caselaw holds that any number and combination of these `additional circumstances' could support a finding of [reasonable suspicion]."120
Defendants point to a number of cases in which they argue that only Side 2 circumstances existed but that courts nonetheless found reasonable suspicion for a stop.121 Most of the cases, however, do not support defendants' argument because they presented circumstances that are captured by the boxes on Side 1.122 Two of defendants' cases do, however, lend some support to their argument. In United States v. McCargo, the Second Circuit found that reasonable suspicion existed when officers responded to a 911 call for an attempted burglary at 1:00 am and observed the defendant walking alone in a high crime area 200 feet from the crime scene.123 Defendants point out that all of the circumstances that clearly fit this fact pattern are on Side 2 — "Report From Victim/Witness," "Proximity to Crime Location," "High Crime Area," and "Time of Day ... Corresponding to Reports of Criminal Activity." Plaintiffs argue that the Side 1 circumstance "Furtive Movement" is also applicable, since the court found that the defendant had been staring so intently at one police car that was at the scene of the crime that he did not notice a second police car pulling up along side him.124 Because a Side 1 box is applicable, they argue, McCargo does not undercut Fagan's classification of Category 3 worksheets as "unjustified." This is a rare instance in which plaintiffs — whose expert strongly criticizes the NYPD's use of "furtive movement" to justify stops and (perhaps fairly) derides the term as so ambiguous as to be "almost meaningless"125 — are seeking to describe what might arguably be considered an innocent action as furtive and suspicious. Like Judge Richard Posner, I am skeptical that staring intently can constitute suspicious behavior,126 but I recognize that the Second Circuit considered McCargo's staring in its reasonable suspicion analysis. Although the Circuit never used the term "furtive," McCargo's stare could only be classified on the UF-250 under either the "Furtive Movement" box or under one of the two "Other" boxes. This case therefore arguably supports defendants' criticism of Fagan's Category 3.
The second case cited by defendants that arguably supports their claim that two or more Side 2 factors can indicate reasonable suspicion even in the absence of a Side 1 factor is Sutton v. Duguid, in which Judge Joseph Bianco of the Eastern District of New York found that reasonable suspicion existed to stop Sutton "based on: (1) the observed narcotics activity in a high crime area; (2) plaintiff's proximity to the individual identified as involved in the sale of narcotics; and (3) plaintiff's effort to walk away from the commotion as soon as it broke out."127 As defendants point out, "High Crime Area," "Proximity to Crime Location," and "Changing Direction at Sight of Officer/Flight" are all Side 2 circumstances. Plaintiffs again argue that Sutton's sudden movement away from the commotion could be characterized as a Furtive Movement on Side 1. Again, I am skeptical of the argument, although it is plausible.
Illinois v. Wardlow, however, is more problematic for Fagan's Category 3 than any of the cases cited by defendants. There, the Supreme Court held that a defendant's "presence in an area of heavy narcotics trafficking" and "unprovoked flight upon noticing the police" were together sufficient to raise reasonable suspicion and justify a stop.128 These two factors align most closely with the Side 2 circumstances "High Crime Area" and "Changing Direction at Sight of Officer/Flight." The Supreme Court did not base its decision on any other indicia of suspicion, although it did note that headlong flight is the "consummate act" of nervous, evasive behavior. Again, a police officer might in this instance check the Side 1 "Furtive Movement" box, although the far more appropriate boxes would be the ones on Side 2.
In combination, McCargo, Sutton, and Wardlow suggest that stops may be lawful even if they are based only on factors described on Side 2 of the UF-250s. It is also clear, however, that some combinations of Side 2 factors would be insufficient to establish reasonable suspicion. The two most frequent Side 2 factors were "High Crime Area" and "Time of Day, Day Of Week, Season Corresponding To Reports Of Criminal Activity," which were checked off on 55.4% and 34.1% of all worksheets.129 Reasonable articulable suspicion does not exist merely on the basis of those two factors: many people live in high crime areas and many crimes occur at night; simply being in a high crime area at night is not suspicious behavior.130 It is very difficult to generalize, therefore, about UF-250s that contain two or more Side 2 factors but no Side 1 factors.
The importance of this complexity is mitigated in part because, as plaintiffs point out, police officers have marked very few UF-250s with no Side 1 factors and two or more Side 2 factors. Of the 2.8 million worksheets, only 7,295 — or approximately 0.26% — fit this description.131 "Thus Fagan's inclusion of these stops in this category, even if erroneous, had no meaningful impact on the overall results of his analysis, and therefore would not warrant exclusion."132 At trial, these few stops will be included in the category of stops for which generalization is impossible.
The larger problem, however, relates to stops in Category 6 in which only the "Other" circumstance was checked on Side 1 and two or more circumstances were checked on Side 2. There are 161,130 of these stops, which make up 5.7% of all stops. Fagan marked them as "Indeterminate." As I discussed above, the narratives on the first page of Fagan's random sample exemplify the reason why categorization of these stops is difficult. One narrative reads "dismatling [sic] 95 Honda DLJ6727."133 Without more, this information would not raise reasonable suspicion — mechanics and car owners regularly dismantle cars. However, if the car's alarm was going off and the individual was unable to give a clear answer to the officers' questions, then the two additional circumstances — best categorized by the Side 2 boxes "Sights and Sounds of Criminal Activity, e.g., Bloodstains, Ringing Alarms" and "Evasive, False or Inconsistent Response To Officer's Questions" — in combination with the "Other" narrative likely would give rise to reasonable suspicion.134 Or, to take another example, "Evasive, False or Inconsistent Response To Officer's Questions" and "Changing Direction At Sight Of Officer/Flight" might sufficiently contextualize one of the many "keyless entry" notations to suggest that reasonable suspicion existed in that case as well.135
Some of the "Other" narratives, however, probably would not suggest reasonable suspicion even when combined with two Side 2 factors. I doubt that the narrative "loitering" indicates reasonable suspicion, even when combined with "High Crime Area" and "Time of Day," the two most common Side 2 factors. The same could be said for the many "keyless entry" narratives-as Judge Mukasey noted in Pitre, the fact that the defendant entered a building lobby in a high crime area without a key on a cold December night was not in itself suspicious behavior.136
In short, it is very difficult to generalize about the worksheets that contain only an "Other" factor on Side 1, even if two or more "additional circumstances" are checked off on Side 2. Defendants will surely be able to present to the jury many individual forms in this category that do appear to indicate that reasonable suspicion existed; plaintiffs will likely be able to present many that suggest that no reasonable suspicion existed. I find that admitting expert testimony that makes generalizations about the level of reasonable suspicion indicated by the forms in this group would mislead the jury. The parties' experts will be permitted to testify about verifiable aspects of these forms (e.g., how often certain Side 2 boxes are checked or how often the phrase "keyless entry" or "loitering" appears) and counsel will be able to make arguments about what inferences and conclusions the jury should draw from this data.
e. Forms Containing Only One "Conditionally Justified" Factor
Defendants point to a number of cases in which, they argue, courts have found stops lawful even though only one Side 1 "conditionally justified" indicia of suspicion was present. Over 137,000 worksheets were filled out with only one of these factors and they constitute the large majority of the stops in Fagan's "unjustified" category. Defendants' reading of the caselaw, however, is incorrect.
Plaintiffs have properly identified the components of the various courts' decisions that were excluded from defendants' case summaries and that, if reflected on the arresting officer's UF-250, would have placed the stops in Fagan's "justified" category.137 Even People v. Fernandez, which plaintiffs appear willing to concede arguendo because it would impact the classification of very few worksheets, does not support defendants' argument.138 In Fernandez, the New York Court of Appeals held that a police officer could lawfully stop a person for carrying what the officer had reason to believe was a "gravity knife" based on the "identifiable characteristics of the knife."139 The possession of such knives is per se illegal because of the ease with which they can be used for violence. Defendants argue that Fernandez therefore justifies stops solely on the basis of the Side 1 box "Carrying Objects In Plain View Used in Commission of Crime, e.g., Slim Jim/Pry Bar, etc.," which Fagan deemed "Conditionally Justified," not "Justified." But unlike gravity knives, it is not per se illegal to possess slim jims or pry bars. Possession of those items is not in itself suspicious behavior that justifies a stop because there are many lawful uses of those items. An officer who observes what he believes to be an illegal weapon should also check the boxes "Suspicious Bulge/Object," "Actions Indicative Of Engaging In Violent Crimes," and/or "Other Reasonable Suspicion." Fernandez does not support the argument that a person can be stopped based solely on the fact that he is carrying a pry bar or a slim jim.
f. Location and Time of Stops
Defendants' final criticism of Fagan's classification system is that it fails to incorporate the location of the stop and other writings on the form (beyond those in the line under the circumstance "Other"). Officers are required to note on the worksheet the address or intersection where the stop takes place and defendants argue that this information may support a finding of reasonable suspicion if the location is in a high crime area; this is the case, they argue, even if the officer did not check off "High Crime Area" on the worksheet.140 During certain years, the entirety of the 73rd and 75th Precincts were classified as high crime "impact zones." Defendants argue that "High Crime Area" should be imputed to all stops from those precincts during those years, converting approximately 33,000 stops from "unjustified" to "justified."141 That number would grow significantly if stops in other impact zones were treated similarly.
Professor Fagan provides a reasonable explanation of why he chose not to impute that category onto worksheets on the basis of location:
[W]e assumed and based our decision on the fact that officers were trained to check all [boxes] that applied. And we assumed that if, in fact, the stop took place in a high crime area, they would have checked the box accordingly. So we really didn't want to second guess the decision of the officer.
Second, we didn't want to impose our decision or criteria about what's a high crime area versus a low crime area. I think as you can see from some of our charts, crime distributes very widely across the city from very low crime rates in some places to high crime rates in other places. We didn't know what the cut-off was. We couldn't say how officers are trained to think about high crime area. Was it very high in the last month or week? What constitutes high? Three [ ] robberies[? T]en total felony crimes? Does it include felonies plus misdemeanors?142
Fagan's explanation is certainly reasonable. Rather than try to develop his own complex formula for determining what is or is not a high crime area for the purpose of reasonable suspicion, he deferred to the police officers' simple binary decision to check or not to check the "High Crime Area" box. When evaluating reasonable suspicion in an individual suppression hearing or Section 1983 case, such blind deference is inappropriate and officers should be required to support their claims with evidence.143 But when trying to generalize about 2.8 million stops, Fagan's choice was reasonable. Defendants correctly note some of the drawbacks of that methodological decision but, at best, their arguments impact the weight of Fagan's opinion, not its admissibility. The same is true of his decision not to use the time of a stop as a substitute for the Side 2 circumstance "Time of Day, Day of Week, Season Corresponding To Reports Of Criminal Activity" and his decision not substitute any notation about a suspect's height/weight/tattoos in place of the Side 1 circumstance "Fits Description."144 If police officers chose not to check those boxes, it was reasonable of Fagan not to second guess that choice.
3. Fagan's Opinions Regarding the Results of the Stop-and-Frisk Policy Are Admissible
Finally, defendants argue that Fagan makes speculative and conjectural opinions about the process by which officers complete the UF-250 and about the outcomes of the stops. Specifically, defendants object to Fagan's hypotheses regarding the frequent use of "high crime area" and "furtive movements" on the UF-250s and his use of a "hit rate" in assessing the effectiveness and legality of the NYPD's stop-and-frisk policy. Neither argument has merit.
Fagan notes that officers check the "High Crime Area" box in approximately fifty-five percent of all stops, regardless of whether the stop takes place in a precinct or census tract with average, high, or low crime.145 Defendants believe that this analysis is "misleading" because there are high crime pockets even in low crime precincts and "it is not unreasonable for officers to check this box when a stop occurs" in such an area.146 Fagan rebuts defendants' argument by noting that his analysis is true at the census tract level as well, and plaintiffs correctly note that this is simply a disagreement over the expert's conclusions, not his methodology.147 The same is true for Fagan's observation that when the "High Crime Area" and "Furtive Movement" boxes are checked off, police officers are less likely to make an arrest than when those boxes are not checked off.148 Fagan hypothesizes that this result may occur because officers are marking these two "broad and subjective" boxes after conducting stops for which they actually did not have objective reasons to be suspicious. Or, as retired NYPD officer Peter Mancuso said at a 2010 New York City Bar Association forum, "[f]urtive movements ... tells me that the cops are out there winging it a bit ... they're really not looking for individuals."149 Defendants object to this hypothesis because "[e]xpert testimony offering `interpretations of conduct or views as to the motivation of parties' has been excluded on the grounds that it invades the province of the jury and addresses matters that jurors are capable of understanding on their own" and that it constitutes "an impermissible credibility assessment" of the police officers who fill out the forms.150 But the testimony excluded in Rezulin was (a) the opinion of an "expert" on what he believed constituted ethical medical behavior151 and (b) speculation about the motivations of individual defendants on the basis of what those defendants had said and written.152 This is entirely different from Fagan's proposed testimony, in which he offers hypotheses regarding the causes of trends that he has observed by performing statistical analyses of complicated data sets. Unlike in Rezulin, the expert's testimony will not address "`lay matters which a jury is capable of understanding and deciding without the expert's help.'"153 Fagan is indisputably a criminology expert who is qualified to offer opinions about trends that he observes in the interactions between the police and civilians; he is not passing judgment about the credibility of any one witness but is instead offering theories about what kinds of behavior might lead to certain results that are evident in the data. Defendants may dispute these conclusions but they may not prevent their admission.
Defendants also object to Fagan's reliance on "hit rates." He calculates that "5.37 percent of all stops result in an arrest," that [s]ummonses are issued at a slightly higher rate: "6.26 percent overall," and that "[s]eizures of weapons or contraband are extremely rare. Overall, guns are seized in less than one percent of all stops: 0.15 percent ... Contraband, which may include weapons but also includes drugs or stolen property, is seized in 1.75 percent of all stops."154
Defendants argue that Fagan "conflates the legal standards required for stops [i.e., reasonable suspicion] and arrests [i.e., probable cause]."155 While of course it is true that "`reasonable suspicion' is a less demanding standard than probable cause,"156 the requisite level of confidence that officers must have in either event relates to the same question: whether or not crime is afoot. If the underlying data is reliable, arrest or "hit rates" are probative — although perhaps not dispositive — of whether or not officers are making stops and arrests on the basis of reasonable suspicion and/or probable cause. This analysis is properly facilitated by comparing the hit rates based on "reasonable suspicion" to hit rates based on random stops.157
The City argues that the use of hit rates "ignores deterrence as an outcome of a stop, which is perhaps the most successful outcome," and posits as its example of such deterrence a scenario in which an officer "stops a person for casing an individual or property, before such person has an opportunity to commit an offense" and thereby prevents the commission of a crime.158 However, in such a scenario, where the suspect has already taken significant steps towards the commission of a crime, there would in fact be probable cause to arrest that suspect for an "attempt" crime. It is notable that the City acknowledges that "deterrence" is a goal of its stop-and-frisk policy. Deterrence is of course a crucial aspect of law enforcement (and criminal justice policy in general) and it may lawfully be pursued in many different ways — more cops walking their beats, better detective work, etc. But it may not be accomplished through the use of unlawful stops.159 A Terry stop may only be used when the police have reasonable suspicion that a crime has taken, is taking, or is about to take place.
Plaintiffs have submitted a sworn affidavit from New York State Senator Eric Adams, who retired as a police captain after more than twenty years of service in the NYPD. Senator Adams says that in July 2010 he met with Defendant Police Commissioner Raymond Kelly to discuss proposed legislation regarding stop and frisk practices and that during the meeting
Commissioner Kelly stated that the NYPD targets its stop-and-frisk activity at young black and Latino men because it wants to instill the belief in members of these two populations that they could be stopped and frisked every time they leave their homes so that they are less likely to carry weapons.160
Commissioner Kelly denies Senator Adams' claim:
At that meeting I did not, nor would I ever, state or suggest that the New York City Police Department targets young black and Latino men for stop and frisk activity. That has not been nor is it now the policy or practice of the NYPD. Furthermore, I said nothing at the meeting to indicate or imply that such activity is based on anything but reasonable suspicion. At the meeting, I did discuss my view that stops serve as a deterrent to criminal activity, which includes the criminal possession of a weapon.161
Although by no means dispositive of the question, Fagan's finding that guns are seized in approximately 0.15% of all stops is at least relevant to an assessment of Commissioner Kelly's claim that the NYPD's policy is a deterrent to the illegal possession of weapons. Fagan's findings related to seizure of other contraband and to the arrest and summons rates are also admissible, even if defendants object strenuously to the conclusions that plaintiffs will ask the jury to draw from those statistical observations.
V. CONCLUSION
For the reasons explained above, defendants' motion is granted in part and denied in part. The Clerk of the Court is directed to close this motion [Docket No. 178].
SO ORDERED.
SIDE 1
APPENDIX 2:
PAGE 1 OF "OTHER" NARRATIVE LIST
MISSING FRONT PLATE
HANGING OUT IN LOBBY
PROS PRONE LOCATION
TAP BUILDING
BURG PATTERN INVESTIGATION
INSIDE BAK W/NO PASS CODE (SET OFF ALARM)
APPEARED TO BE SMOKING MARIJ
NO HEADLIGHTS
LOITERING IN LOBBY
WAISTBAND
XNE
KEYLESS ENTRY
LOITERING ON 2FL HALLWAY
DISMATLING 95 HONDA DLJ6727
CRIM TRESS
KEYLESS ENTRY
WAS NOT OWNER DID NOT KNOW OWNER.
OPEN DOOR 10-11
PLATES DID NOT MATCH VEHICLE
XNE
KEYLESS ENTRY
CELL PHONE
UNREGISTERED VEHICLE
LEANING ON LOBBY HALL
PERSON STOPPED BY STORE MANAGER FOR SUSPICION OF PETIT LARCENY
10-39 LEAVING BUILDING
10-11
REAR ENTRY
REPORT FROM WITNESS
NO FRONT PLATE ON VEHICLE/TRUNK LOCK BROKEN
FORD PROBE PINK ECK 87D2
VENDING ON STREET
CRIM TRES
BANGING OUT OUTSIDE ON BALCONY OF NYCHA BUILDING
DEFT OBSERVED IN NYCHA BUILDING
THROWING TRASH, YELLING
TRESPASS
LOITERING
KEYLESS ENTRY
LOITERING IN HALLS
PROXIMTY TO CRIME LOCATION