WOOD, District Judge:
From 1993 to 2012, New York City's Board of Education (the "BOE") required all applicants for public school teaching positions to pass a qualifying examination called the Liberal Arts and Sciences Test, often referred to as the "LAST." There were two incarnations of the exam: the LAST-1, administered from 1993-2004, and the LAST-2, a significantly revised version administered from 2004-2012. These tests were not intended to evaluate an applicant's mastery of the particular subject areas she might teach, or an applicant's capacity to respond to pedagogical challenges that might arise in the classroom — the BOE evaluated those abilities with separate qualifying examinations. Rather, as their full names suggest, the LAST-1 and LAST-2 were designed solely to test an applicant's understanding of the liberal arts and sciences.
Judge Motley of this court previously held that the BOE unfairly discriminated against African-American and Latino applicants, in violation of Title VII of the Civil Rights Act, by requiring them to pass the LAST-1.
Exercising its broad remedial authority, the Court then appointed a neutral expert, Dr. James Outtz, who was acceptable to the parties, to evaluate whether the LAST-2 also had a disparate impact on African-American or Latino test takers — and if so, whether the exam had been properly validated as job related. The Court permitted the BOE to submit a rebuttal expert report from Dr. Chad Buckendahl, and held a hearing during which both parties and the Court questioned the experts. Dr. Outtz concluded that the LAST-2 had a disparate impact on African-American and Latino test takers and had not been properly validated as job related. Dr. Buckendahl and the BOE did not dispute the exam's disparate impact, but they argued that the LAST-2 had been properly validated.
After reviewing all of the evidence offered by Dr. Outtz and the parties, including expert opinions and the Equal Employment Opportunity Commission's Uniform Guidelines on Employee Selection Procedures, the Court holds that the BOE unfairly discriminated against African-American and Latino applicants by requiring them to pass the LAST-2. Like its predecessor, the LAST-2 had a disparate impact on African-American and Latino test takers. And like its predecessor, the LAST-2 was not properly validated as job related, because the exam's designers did not employ procedures to identify the specific areas and depth of knowledge of the liberal arts and sciences that any competent teacher would need to understand. The BOE's use of the LAST-2 was thus unfairly discriminatory under Title VII.
In reaching that conclusion, the Court does not suggest that it would be unhelpful or unwise for the BOE to test applicants' knowledge of the liberal arts and sciences with a properly validated exam. It may be the case that all teachers, whether they instruct kindergarteners or high school seniors, must understand certain areas of the liberal arts and sciences (separate and apart from the particular subject matter they teach) in order to be competent in the classroom. But the Court is not permitted to simply intuit that fact; test designers must establish it through adequate validation procedures. In that regard, both the LAST-1 and the LAST-2 were deficient, which renders them indefensible under Title VII.
The New York State Education Department ("the SED") requires the BOE to hire only New York City public school teachers who have been certified by the State. Gulino III, 907 F.Supp.2d at 498. If the BOE were to hire teachers who have not been certified by the State, New York City could lose as much as $7.5 billion a year in state funding. See (Oct. 23, 2014 Jt. Ltr. [ECF No. 515] at 2-3).
Beginning in 1993, the SED required teachers seeking certification to pass the LAST-1, a new test developed at the State's request by National Evaluation Systems ("NES"),
In 2004, the SED phased out the LAST-1 and replaced it with the LAST-2. See (Dec. 8, 2009 Order [ECF No. 243] at 3). The LAST-2 was first used for teacher certification on February 14, 2004. (Id.) Prior to using the LAST-2, NES and SED documented the process by which they sought to validate the test as job related. See generally (Clayton Decl.).
At the time the LAST-2 was implemented, prospective teachers were required to pass two additional written exams: the Assessment of Teaching Skills — Written ("ATS-W"), and the Content Specialty Test ("CST") applicable to the teacher's subject area. See (BOE Ltr., Attachment A, [ECF No. 504-1]) (listing the different certification requirements mandated by the SED over time). According to Pearson, the ATS-W was "designed to assess pedagogical (teaching) skills that New York educators determined to be important to the adequate performance of the job of ... public school teachers." (Pearson Ltr. [ECF No. 500] at 2). The CST was designed to "assess the specific knowledge and skills needed to teach specific subject matter in New York public schools, such as mathematics, physics, chemistry, American Sign Language, Cantonese, Japanese, etc." (Id.) A prospective teacher was required to pass the ATS-W, any applicable CST, and the LAST-2 in order to receive a teaching license. Applicants were not "permitted to compensate for a poor score on one exam with a high score on another. See (Feb. 3, 2015 Ltr., Attach. I ("Outtz Report") [ECF No. 549-1] at 37).
The nineteen-year history of this case is long and winding, and has been set out in the Court's prior opinions, familiarity with which is assumed.
Plaintiffs, who represent a class of African-American and Latino applicants for teaching positions in the New York City public school system, originally alleged that the BOE had violated Title VII by requiring applicants to pass the LAST-1. Plaintiffs claimed that the exam had a disparate impact on African-American and Latino test takers, which was unfairly discriminatory because the exam was not job related.
The case was initially assigned to the Honorable Constance Baker Motley in 1996. In 2003, following "an epic bench trial that lasted more than eight weeks and filled over 3,600 pages of trial transcript," Gulino I, 2003 WL 25764041, at *1, Judge Motley ruled that the BOE had not violated Title VII by adopting the SED's requirement that teachers pass the LAST-1
On appeal, the Second Circuit affirmed in part and reversed in part. Relevant to the instant proceedings, the panel held that Judge Motley had erred by not assessing the LAST-1's job-relatedness under the standard established in Guardians Association of New York City Police Department, Inc. v. Civil Service Commission of the City of New York ("Guardians"), 630 F.2d 79 (2d Cir.1980), and remanded so that the district court could apply that standard. Gulino II, 460 F.3d at 385, 388.
On remand, this Court held that the LAST-1 was not job related because it had not been properly validated by the State and NES. Accordingly, the Court concluded the BOE had violated Title VII by requiring prospective teachers to pass the test. Gulino III, 907 F.Supp.2d at 516-23.
By the time the Court decided Plaintiffs' challenge to the LAST-1, the SED had retired the exam in favor of the LAST-2. Exercising its remedial authority to require that a "subsequent exam" comply with Title VII, Guardians, 630 F.2d at 109, the Court then sought to ensure that the LAST-2 was not unfairly discriminatory. The Court appointed Dr. Outtz to serve as a neutral expert and assess whether the LAST-2 had a disparate impact on African-American or Latino test takers — and if so, whether the exam qualified as job related. See (Apr. 29, 2014 Hearing Tr.
On February 3, 2015, Dr. Outtz concluded that the LAST-2 had a disparate impact on African-American and Latino test takers and did not qualify as job related, because it had not been properly validated. See generally (Outtz Report). In response, the BOE submitted the report of Dr. Buckendahl, which did not address the issue of disparate impact but argued that the LAST-2 had been properly validated. See generally (Buckendahl Response [ECF No. 592]). The SED also submitted a response, which asserted that Dr. Outtz's report was flawed and the LAST-2 had been properly validated.
Under Title VII, a plaintiff can make out a prima facie case of discrimination with respect to an employment exam by showing that the exam has a disparate impact on minority candidates. See N.A.A.C.P., Inc. v. Town of E. Haven, 70 F.3d 219, 225 (2d Cir.1995).
The defendant can rebut that prima facie showing by demonstrating that the exam is job related. Id. To do so, the defendant must prove that the exam has been validated properly. Validation requires showing, "by professionally acceptable methods, [that the exam is] `predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.'" Gulino II, 460 F.3d at 383 (quoting Albemarle Paper Co. v. Moody, 422 U.S. 405, 431, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975)).
In Guardians, the Second Circuit devised a five-part test for determining whether an employment exam, such as the LAST-2, has been properly validated and is thus job related for the purposes of Title VII
Guardians, 630 F.2d at 95; see also Gulino II, 460 F.3d at 384. The first two elements of this test, which concern the quality of the test's development, are "particularly crucial" because "validity is determined by a set of operations, and one evaluates ... validity by the thoroughness and care with which these operations have been conducted." Id. at 95 n. 14 (internal citation and quotation omitted).
Because validation requires expertise that courts lack, validation is "not primarily a legal subject." Guardians, 630 F.2d at 89. Accordingly, to determine whether an employment exam is properly validated, a court "must take into account the expertise of test validation professionals." Gulino II, 460 F.3d at 383. There are two primary sources of expertise on which courts rely to assess validation: (1)
"The threshold task in determining the validity of a challenged examination is to select the appropriate method for assessing its job-relatedness." Guardians, 630 F.2d at 91. The Guidelines detail two validation techniques that are relevant here content validation and construct validation.
Guardians defined "content validation" as "a technique appropriate for tests that measure `knowledges, skills, or abilities' [or `KSAs'] representative of the `content' of the job." Guardians, 630 F.2d at 92 (quoting Guidelines § 1607.14(C)(1)). It defined "construct validation" as a technique that "attempts to measure `constructs,' that is, inferences about mental processes or traits, such as `intelligence, aptitude, personality, commonsense, judgment, leadership and spatial ability.'" Id. at 92 (quoting Guidelines § 1607.14(C)(1)). These definitions suggest that content validation is appropriate when testing for job-specific abilities — for example, the ability of a carpenter to measure the dimensions of building materials — and construct validation is appropriate when testing more general, non-job specific abilities, such as spatial reasoning, or problem solving. See id. at 92-93 (discussing the relationship between job specificity and the construct-content distinction).
The Second Circuit has accepted the proposition that these two validation methods differ from one another. Content validation requires a test's proponent to show that "the content" of the test "is representative of important aspects of performance on the job for which the candidates are to be evaluated." Guidelines § 1607.5(B). As will be discussed more thoroughly below, this can be demonstrated in a fairly straightforward manner by showing a link between the abilities being tested for and the tasks required by the job in question. See infra Part V.c.ii. Construct validity, however, "requires `an extensive and arduous effort.'" Guardians, 630 F.2d at 92 (quoting Guidelines § 1607.14(D)(1)). It demands "a demonstration from empirical data that the test successfully predicts job performance." Id. "Developing such data is difficult, and tests for which it is required have frequently been declared invalid." Id. (citing cases). The result is that "[t]his content-construct distinction... frequently determines who wins the lawsuit. Content validation is generally feasible while construct validation is frequently impossible." Id.
Notwithstanding the "sharp distinction" that the Guidelines draw "between tests that measure `content' ... and tests that
Guardians held that construct validation is not necessary in every instance where a job involves abilities that are somewhat general. "[I]f the test attempts to measure general qualities such as intelligence or commonsense, which are no more relevant to the job in question than to any other job, then insistence on the rigorous standards of construct validation is needed."
Guardians also held that "the degree to which content validation must be demonstrated should increase as the abilities tested for become more abstract."
In 1988, a New York State task force studying teacher qualifications determined that all teachers should have a basic understanding of the liberal arts in order to be competent to teach. Commissioner's Task Force on the Teaching Profession, The New York Report: A Blueprint for Learning and Teaching ("Blueprint Report") 15-19 (1988). It recommended that the state require teachers to pass a liberal arts exam before they receive certification. Id. In 1990, the SED sought to implement the task force's recommendation by contracting with NES to develop the LAST-1. Gulino III, 907 F.Supp.2d at 512-13. Beginning in 1993, the LAST-1 was administered to prospective teachers as a part of their licensure requirement. Id. at 499-500. Several years after the LAST-1 was first administered, the Board of Regents of the State of New York issued new regulations governing teacher certification that required the SED to update the exam. (Clayton Decl. ¶ 13). Accordingly, the SED worked with NES to "extensive[ly] redevelop[]" the test through a "process similar in scope" to the initial development of the LAST-1. (Id.) That redevelopment, which took place between 2000 and 2004, ultimately resulted in the LAST-2, which was first administered to prospective teachers on February 14, 2004. (Dec. 08, 2009 Order 3).
To begin developing the LAST-2, the staff at NES first created a test "framework." The staff relied on this framework to construct each of individual test questions contained in the LAST-2. (Clayton Decl. ¶ 10). Ms. Clayton defines a test "framework" as "a document that describes the overall structure and content of a test." (Id. ¶ 5). She testified that that framework is then broken down into "major groups of content" called "subareas." (Id.) The LAST-2 covered five subareas: (1) "Scientific, Mathematical and Technical Processes;" (2) "Historical and Social Scientific Awareness;" (3) Artistic Expression and the Humanities;" (4) "Communication and Research Skills;" and (5) "Written Analysis and Expression." (Outtz Report 30).
Each objective is further delineated in several "focus statements," which "provide details about the nature and range of content covered by the objectives. They are intended to suggest the types of content that may be included in the test items." (Clayton Decl. ¶ 7). Focus statements for the above objective included: "analyzing problem solutions for logical flaws," and "examining problems to determine missing information needed to solve them." (Outtz Report, App. 2, at 51).
NES developed this framework by using two sources. The first was the LAST-1 framework, which NES "thorough[ly] review[ed]" and then "revised." (Clayton Decl. ¶ 16). The second was a set of documents describing common liberal arts and science course requirements at New York state colleges and universities. (Outtz Report 29); (Clayton Decl. 14).
Once the LAST-2's framework was completed, it was reviewed by two committees of New York state educators the Bias Review Committee ("BRC") and the Content Advisory Committee ("CAC"). (Clayton Decl. ¶¶ 19-24). The BRC evaluated the framework for potential sources of bias, including offensive language, stereotypes, fairness, and diversity. (Outtz Report 31); (Clayton Decl. ¶ 20-21).
Next, NES sent out two separate surveys to educators across the state of New York. The goal of these surveys was to "determine from a broader population the importance of the content objectives of [the LAST-2] to the job of a public school teacher in the State of New York." (Clayton Decl. ¶ 26). Each survey asked the respondent to rate the importance of the individual objectives that made up NES's LAST-2 framework. (Outtz Report 32-33). Specifically, the respondent was asked: "How important is the knowledge
The samples for both surveys, however, were small. The first survey was sent to 500 certified public school teachers, 320 of which (64%) were completed and returned to NES. (Outtz Report 33). Contrast M.O.C.H.A. Soc'y Inc. v. City of Buffalo ("M.O.C.H.A. Soc'y II"), 689 F.3d 263, 269-70 (2d Cir.2012) (describing a job analysis survey of firefighters, which was sent out to 5,934 individuals, and completed and returned by 2,502 individuals). Only twenty-four of the respondents were African-American, and only ten were Latino. (Outtz Report 33). The responses from these two groups were not analyzed separately to determine if their responses differed from those of Caucasian respondents. (Id.) The second survey was distributed to 181 faculty members, but only 45(25%) were returned. (Id.) None of the survey responses came from African-American faculty members, and only three came from Latino faculty. (Id. at 33-34).
Survey results indicated that respondents believed all of the objectives were are least somewhat important, and most of them were of "great importance." (Outtz Report 34); (Clayton Decl. ¶¶ 30-31).
After the SED approved the framework, NES began the process of item development, whereby the individual test questions were drafted, reviewed, and refined. It appears that some of these questions were derived from test questions in the existing LAST-1 item bank. Ms. Clayton states that LAST-1 questions "were given preliminary designations for continued use, for revision, or for deletion" based on their relevance to the LAST-2 framework. (Clayton Decl. ¶¶ 34-35). The newly-drafted test questions were then reviewed by the BRC and the CAC. (Id. ¶ 37). However, it appears that those LAST-1 questions that were designated for continued use were reviewed only by the CAC. See (id. ¶ 45) ("All items designated for continued use from the existing [LAST-1] item bank were reviewed by the [LAST-2] Content Advisory Committee to verify their continued match to the revised objectives and their continued job-relatedness, accuracy and freedom from bias.").
Next, the new and revised test questions were "pilot tested," in a two-pronged process. NES included some of the potential questions in officially-administered LAST-1 exams, but designated those questions as non-scorable items, such that they did not count towards a test-taker's score. Additional questions were separately administered to volunteer examinees as a means of independently analyzing how test takers responded to the questions. (Id. ¶ 46). The results from this pilot testing were reviewed by the BRC and the CAC. (Id. ¶¶ 49-50).
Finally, NES created a Passing Score Review Panel. The Panel consisted of New York educators, who provided the information the New York Commissioner of Education used to set the passing score for the LAST-2. (Id. ¶ 52). The Panel was asked to "[i]magine a hypothetical individual who is just at the level of knowledge and skills required to perform the job of an educator receiving a teaching certificate
Based on the reports and testimony of Dr. Outtz and Dr. Buckendahl, and the submissions made by both parties and the SED with respect to the development of the LAST-2, the Court finds that Plaintiffs have made a prima facie showing of discrimination, by demonstrating that the exam causes a "`disparate impact on the basis of race, color, religion, sex, or national origin.'" Ricci v. DeStefano, 557 U.S. 557, 578, 129 S.Ct. 2658, 174 L.Ed.2d 490 (2009) (quoting 42 U.S.C. § 2000e-2(k)(1)(A)(i)). The BOE has failed to rebut that prima facie showing because it has not demonstrated that the LAST-2 was properly validated. As explained below, NES's test development process did not comport with the five-factors the Guardians court deemed critical to exam validation.
A prima facie showing of discrimination "requires plaintiffs to establish by a preponderance of the evidence that the employer uses a particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin." United States v. City of N.Y. ("Vulcan Soc'y"), 637 F.Supp.2d 77, 86 (E.D.N.Y.2009) (internal quotation marks omitted). To do so, a party must "(1) identify a policy or practice, (2) demonstrate that a disparity exists, and (3) establish a causal relationship between the two." Id. (internal quotation marks omitted). A party can meet the second and third requirement by relying on the "80% rule." The rule is described in Guidelines § 1607.4(D):
In other words, "if the minority group performs less than 80% as well as the highest performing group, disparate impact will generally be inferred." Vulcan Soc'y, 637 F.Supp.2d at 87.
Plaintiffs have satisfied all three of the requirements, as is demonstrated in Dr. Outtz's report. Dr. Outtz's report identifies the specific employment practice at issue: the BOE's requirement (mandated by the SED) that prospective teachers pass the LAST-2 before they can be hired. (Outtz Report 5). Although the pass rates varied year to year, Dr. Outtz's report demonstrates disparate impact by showing that the pass rates for African-American and Latino applicants were between 54% and 75% of the pass rate for Caucasians. (Id. at 14-16); (Outtz Rebuttal [ECF No. 565] at 16-22).
The SED disputes Dr. Outtz's calculations and argues that Dr. Outtz should have "take[n] into account the best attempt by candidates prior to applying for a license to teach," rather than their first attempt, as Dr. Outtz did. (SED Response 1). Dr. Outtz disagrees, stating:
(Outtz Rebuttal 22). The Court agrees with Dr. Outtz's rationale, and holds that it was proper for him to calculate adverse impact based on first attempts. See Ass'n of Mexican-Am. Educators v. California, 937 F.Supp. 1397, 1407 (N.D.Cal.1996), aff'd, 231 F.3d 572 (9th Cir.2000) ("[A]dverse impact is appropriately measured by the first time a candidate sits for the [employment exam] and fails it.").
Accordingly, the Court holds that Plaintiffs have met their prima facie burden of demonstrating disparate impact. The burden then shifts to the proponent of the test to demonstrate that the test was "job related for the position in question and consistent with business necessity." 42 U.S.C. § 2000e-2(k)(1)(A)(i); see also M.O.C.H.A. Soc'y II, 689 F.3d at 274. The BOE has failed to make such a showing.
To prove job-relatedness, a test proponent must demonstrate the test's compliance with each of the five factors set forth in Guardians and discussed in Part III, supra. The Court's analysis of the LAST-2 here focuses primarily on the first of those five factors: the sufficiency of NES's job analysis.
NES's job analysis involved the use of a content validation methodology to validate the LAST-2. In light of the content-construct continuum described by Guardians, this was the correct methodology to use. An understanding of the liberal arts and sciences is not a KSA so general that it is needed to perform nearly every job.
The SED contends that the LAST-2 tests for specific content, such as "math, science, and technology," "art, literature, religion, and philosophy," and "geography and culture." (SED Ltr. [ECF No. 590] at 5); but see (id.) ("The LAST measures general knowledge."). After reviewing several LAST-2 exams, the Court finds that although the texts as to which an applicant is questioned are on topics such as math, art, etc., the questions themselves do not appear to require any significant outside knowledge to answer correctly. They test less for content than for such abilities as reading comprehension, logical thinking, and problem solving. See (March 20, 2015 Hearing Tr. 26-27). Those abilities are quite general; many, if not most, jobs require at least some level of reading comprehension, for instance. Based on Guardian's sliding scale approach, the Court therefore must rigorously assess the LAST-2's content validity, beginning with NES's job analysis.
A job analysis is an "assessment `of the important work behavior(s) required for successful performance'" of the job in question and the "`relative importance'" of these behaviors. Guardians, 630 F.2d at 95 (quoting Guidelines § 1607.14(C)(2)). The purpose of a job analysis is to ensure that an exam adequately tests for the KSAs that are actually needed to perform the daily tasks of the job. See Vulcan Soc'y, 637 F.Supp.2d at 111. The test developer must be able to explain the relationship between the subject matter being assessed by the exam and the job tasks identified. Compare id. (finding that defendant's job analysis for a test given to firefighter candidates was inadequate because no effort had been made to explain the relationship between the knowledge, skills, and abilities being tested on the exam and the tasks involved in being a firefighter), with M.O.C.H.A. Soc'y Inc. v.
To perform a suitable job analysis, a test developer must: (1) "identify the tasks involved in performing the job," Gulino III, 907 F.Supp.2d at 516; see also Guardians, 630 F.2d at 95 (describing defendant's job analysis as adequate because, inter alia, "the work behaviors involved... were identified by extensive interviewing, and subjected to serious review"); (2) "includ[e] a thorough survey of the relative importance of the various skills involved in the job in question," M.O.C.H.A. Soc'y, II, 689 F.3d at 278 (internal quotation omitted) (emphasis added); and (3) define "the degree of competency required in regard to each skill." Id. (internal quotation omitted) (emphasis added).
As Dr. Outtz points out in his report,
Instead of beginning with ascertaining the job tasks of New York teachers, the two LAST examinations began with the premise that all New York teachers should be required to demonstrate an understanding of the liberal arts. The impetus for the LAST-2, as it was for the LAST-1, appears to have been the 1988 report by the Commissioner's Task Force on the Teaching Profession, discussed in Part IV, supra, which recommended that New York include a liberal arts requirement in its licensing procedure.
In time, the SED adopted the Commission's recommendation, and contracted
In other words, NES started with the unproved assumption that specific facets of liberal arts and science knowledge were critically important to the role of teaching, and then attempted to determine how to test for that specific knowledge. This is an inherently flawed approach because at no point did NES ascertain, through an open ended investigation into the job tasks a successful teacher performs, whether its conception of the liberal arts and sciences was important to even some New York public school teachers, let alone to all of them. See Guidelines § 1607.5 ("Evidence of the validity of a test or other selection procedure by a content validity study should consist of data showing that the content of the selection procedure is representative of important aspects of performance on the job for which the candidates are to be evaluated.").
Dr. Buckendahl, the BOE's expert, argues that NES's failure to identify job tasks does not necessarily render its job analysis unacceptable. He contends that because NES surveyed several hundred teachers about the importance of the KSAs that NES identified, and those teachers affirmed their importance, NES sufficiently demonstrated that those KSAs are necessary to the job of teaching. (Buckendahl Response 6-10). The Court finds these contentions unpersuasive, particularly given the high degree of validation required for tests, such as the LAST-2, that measure highly general abilities.
The problem with NES's approach, and with Dr. Buckendahl's endorsement of it, is that it assumed, without investigation or proof, that specific KSAs are important to
As an example, and as a way of making these issues somewhat more concrete, assume that the KSA of reading comprehension has an importance value of 9, the KSA of logical reasoning has an importance value of 4, and the KSA of leadership has an importance value of 20. Assume that NES's survey would have queried the value of both reading comprehension and logical reasoning, but not of leadership. Ranked relative to each other, reading comprehension would be very important, while logical reasoning might be somewhat important. But in this example, neither is nearly as important as leadership. In this way, NES's survey would have greatly exaggerated the importance of both reading comprehension and logical reasoning.
This defect would not have existed had NES used an appropriate method to identify the job tasks of New York teachers in the first place. If leadership were important to the job of teaching, the identified job tasks would have made that clear, and the survey NES sent to educators would have included the KSA of leadership alongside the KSA of, for example, reading comprehension. This process would have provided NES with a much more accurate understanding of what KSAs are most important to the job of teaching, both overall and relative to one another.
NES's survey thus failed to ascertain what KSAs are most important to the job of teaching. Although this sort of survey might be an appropriate way of confirming information gathered through a proper job task investigation, or as a way of determining the relative importance of already-ascertained job tasks, it is not an appropriate way of initially identifying KSAs.
Additionally, Dr. Buckendahl's reliance on NES's surveying of educators is further undermined by the deficiency in the survey's sample. NES should have determined the size and construction of its sample by taking into account the makeup and number of all of the subgroups (e.g. kindergarten teachers, special education teachers, African-American teachers, New York City teachers) NES needed to survey in order to achieve a representative sample.
As the Court discussed in Part IV, supra, very few of the survey respondents were African-American or Latino.
Although the percentages match approximately, the raw number of minority respondents — twenty-four African-American respondents and ten Latino respondents — was too small to permit NES to determine whether the answers received from minority teachers differed from those of majority teachers in any statistically meaningful way. (Outtz Report 33); see also Exxon Corp. v. XOIL Energy Res., Inc., 552 F.Supp. 1008, 1021 (S.D.N.Y.1981) (Broderick, J.) (taking issue with a survey because "[o]nly 97 respondents were interviewed in each of the two segments of the survey" and "all of the interviewees came from the vicinity of New York City," concluding that "the survey was not conducted on a properly selected and representative sample of the population."); cf. Vista Food Exch., Inc. v. Vistar Corp., No. 03-CV-5203, 2005 WL 2371958, at *7 (E.D.N.Y. Sept. 27, 2005) (rejecting a consumer survey after finding the sample of 75 respondents to be too small); Bonechi v. Irving Weisdorf & Co., Ltd., No. 95-CV-4008, 1995 WL 731633, at *8 (S.D.N.Y. Dec. 8, 1995) (Schwartz, J.) (finding sample of 69 participants too small to adequately represent universe of potential consumers of New York City souvenir books). It was incumbent on NES to use a larger sample, and to consider whether it should oversample certain demographic groups that are small in number, to ensure that each of those groups is meaningfully represented in the survey. See Irwin L. Goldstein, et al., An Exploration of the Job Analysis-Content
Because this is the second time during this case that NES has failed to complete properly a job analysis with respect to an employment exam, it may be useful to describe how a lawful job analysis should proceed.
NES should begin by first identifying the necessary job tasks for a New York public school teacher. Necessary job tasks could be identified through some combination of (1) teacher interviews, (2) observations of teachers across the state performing their day-to-day duties, see (Outtz Report 23-25) (discussing methods used for gathering job task information); cf. Guardians, 630 F.2d at 95 (approving of a portion of the defendant's job analysis, where "work behaviors involved in being a police officer were identified by extensive interviewing, and subjected to serious review"), and (3) the survey responses of educators who have been given open-ended surveys requiring them to describe the job tasks they perform and to rank the importance of those tasks, see (Outtz Report 23-25); cf. Vulcan Soc'y, 637 F.Supp.2d at 111 (describing the use of "job questionnaires" to develop a list of job tasks). Simply consulting educational curricular documents is not a sufficient way of identifying job tasks or KSAs. Job tasks must be ascertained from the source — in this case, from public school teachers.
Using the data culled from such an investigation, NES could then analyze these job tasks, and from that analysis determine what KSAs a teacher must possess to adequately perform the tasks identified. See Guidelines § 1607.14(C)(4) ("For any selection procedure measuring a knowledge, skill, or ability the user should show that (a) the selection procedure measures and is a representative sample of that knowledge, skill, or ability; and (b) that knowledge, skill, or ability is used in and is a necessary prerequisite to performance of critical or important work behavior(s)."). NES should document precisely how those KSAs are necessary to the performance of the identified job tasks. See Guidelines § 1607.15(A)(3). It is those KSAs that should provide the foundation for the development of the test framework.
The importance of identifying these job tasks is amplified here because every teacher in New York must be licensed, whether she teaches kindergarten, or advanced chemistry. See (Feb. 19, 2015
Last, NES needs to make sure that the relevant test (here, the LAST-2) tests for abilities not already tested for by related exams. Here, applicants must also pass the ATS-W and the appropriate CST before they can become licensed.
A job analysis serves as the foundation for every other aspect of the validation process Guardians requires. NES's failure to perform a proper job analysis infected every other part of its validation process, rendering each similarly deficient.
Reasonable Competence. Testmakers are generally viewed as having used reasonable competence if the exam was created by professional test preparers, and if a sample study was performed that "ensure[d] that the questions were comprehensible and unambiguous." M.O.C.H.A. Soc'y II, 689 F.3d at 280. Here, NES, a professional test preparer, see Gulino III, 907 F.Supp.2d at 519, conducted a sample study, see (Clayton Decl. ¶¶ 46-48). This showing is insufficient, however, when a portion of the test development process — in this case, the job analysis — is so wholly deficient. Such a pervasive error inherently negates what might otherwise be a finding of reasonable competence. The LAST-2 thus fails to conform to the second Guardians factor.
Content Relatedness. Assessing the content relatedness of an exam "is intertwined with the job analysis." Gulino III, 907 F.Supp.2d at 520. Content relatedness is demonstrated by showing that the "`abilities tested for ... adequately relate[] to most of the identified tasks.'" Vulcan Soc'y, 637 F.Supp.2d at 116 (quoting Guardians, 630 F.2d at 98). Because the law requires a job analysis to begin with the identification of job tasks, NES's failure to identify job tasks makes it impossible to assess the content-relatedness of the LAST-2.
Representativeness. For the same reasons, NES has also failed to demonstrate that the content of the exam is "a representative sample of the content of the job." Guardians, 630 F.2d at 98. The representativeness requirement has two components: "[t]he first is that the content of the test must be representative of the content of the job; the second is that the procedure, or methodology, of the test must be similar to the procedures required by the job itself." Id. Because NES never identified the tasks that make up the job, it is impossible to determine whether the content of the LAST-2 is representative of that job, or whether the test's procedures are similar to those of the job.
Scoring. Nor is it possible for the Court to determine whether the LAST-2's scoring system "usefully selects from among the applicants those who can better perform the job." Guardians, 630 F.2d at 95. Because NES did not define initially what the job of teaching entails, it is not possible to determine whether the scoring system used by the LAST-2 selects those applicants who can better perform that job.
The LAST-2 thus fails to meet any of the five criteria set forth in Guardians to assess whether an exam has "sufficient content validity to be used notwithstanding its disparate racial impact." Id. Therefore, the Court finds that the LAST-2 was
For the reasons set forth above, the Court finds that the BOE violated Title VII by requiring Plaintiffs to pass the LAST-2 in order to receive a permanent teaching license. The parties shall submit a joint status letter to the Court by June 29, 2015, identifying what steps need to be taken in accordance with this Opinion.
SO ORDERED.
David M. Van De Voort, et al., Work Analysis Questionnaires and App Interviews, in The Handbook of Work Analysis 58 (Mark A. Wilson, et al. eds., 2012).