NAEP State Analysis Project

NAEP State Analysis Project

The National Center for Education Statistics (NCES) funded my proposal for a five-year project to carry out a $5 million project to conduct analyses to compare state assessment results to the National Assessment of Educational Progress (NAEP). These analyses are designed to serve the "confirming" function of NAEP vis-a-vis state assessment results, in conjunction with the No Child Left Behind Act. The reports on the 2003 assessments indicated that state achievement standards vary haphazardly from state to state and bear relation to variation in actual achievement across states. This contract is with AIR. I have asked Victor Bandeira de Mello to act as Project Director, following my retirement from AIR in 2005.

Analysis of School-Level State Assessment Scores

I used the school-level scores on the National Longitudinal School-Level State Assessment Score Database (NLSLSASD) to produce reports (a) identifying schools in high poverty areas throughout the country that are beating the odds to produce high levels of student achievement and (b) graphically comparing the relations between poverty, Title I program participation, and reading and mathematics achievement. The latter reports are available at www.schooldata.org.I have asked Victor Bandeira de Mello to act as Project Director, following my retirement from AIR in 2005.

Addition of Education Finance Survey Data (F33) to the Longitudinal Common Core of Data

Building on the CCD longitudinal research file I built previously, I added school district finance information (i.e., federal, state, and local revenues and instructional and other expenditure categories) to create a file from 1989-1990 through 1999-2000. This database contains data on all regular public school district serving students in at least one of the 11 years. The database includes a separate file of uniform pseudo-unified districts, in which each feeder elementary district is aggregated with the high school district. To which it sends most of its students. The files and documentation can be downloaded from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2005863.

Development of the Common Core of Data Longitudinal Research File

This started in 1993 as a task, done jointly with Roger Levine, on my Elementary and Secondary Education (ELSEC) Task Order Contract with NCES. We were to produce a report using CCD to address useful research issues about the nation’s public schools. I found that the CCD data were badly in need of both editing and imputation of missing values. Based on previous work on PROC IMPUTE, I carried out longitudinal editing of the CCD School District file for the school-years from 1986-87 through 1991-92 for the report. Following the completion of the ELSEC project, Lee Hoffman, at NCES, asked me to continue to add years to the file, to create a CCD Longitudinal Research Database. I added years through 1996-97, and then I undertook the larger task of adding the CCD School Universe Data to the Longitudinal Research Database. That database, with documentation, can be downloaded from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2003420.

Dispelling the Myth On-Line

With Ann Win’s help, I developed a webpage for The Education Trust (http://www2.edtrust.org/edtrust/dtm/ )which allows users to identify all schools with more than a specified level of poverty in a state that have achievement scores above a specified criterion. The user can specify the criterion in each case, which can include various subjects, grades and years. This webpage makes use of the NLSLSASD, a database of school-level state assessment scores available at www.schooldata.org.

Assessment Data Collection and Analysis

This project, awarded in 2001, represented the first full funding for the National Longitudinal School-Level State Assessment Score Database (NLSLSASD). That effort started in 1995, when on a subcontract to one of the NAEP Secondary Analysis Grants, AIR acquired school-level state assessment scores for 23 states and matched them to NAEP to test the validity of NAEP’s selection of substitute schools for schools that refused to participate (the result was that there was no noticeable difference in the average state assessment scores between refusing and substitute schools. In 1997, I merged the database with the Schools and Staffing Survey (SASS) to carry out analyses of the relations of school characteristics to school achievement, showing that class size is a correlate of reading achievement, more than math achievement. Starting in 1998, Victor Bandeira de Mello and I expanded the database to cover all states, as a part of the NAEP contract, to help states understand the relations between their assessment results and NAEP results. Then, in 2001, we obtained funding from Alan Ginsburg to collect the data for the purpose of supporting federal program evaluation efforts.

Development of Racial Isolation Indicators for The Condition of Education

At Michael Ross’ suggestion, I used the CCD Longitudinal Research Database which I had created to produce a brief report for the 1999 Condition of Education on trends in school segregation over the period from 1987 to 1997 (http://nces.ed.gov/pubs99/condition99/pdf/section5.pdf , indicator 47). For this, I developed a measure of relative racial isolation, which I later generalized to provide a method for partitioning total racial isolation into hierarchical levels (e.g., region, state, district, school) and multiple subgroups (e.g., Blacks, Hispanics, Asians, American Indians) ( Racial Isolation Decomposition ).

NAEP Secondary Analysis Grant to Evaluate an Item-Based Method of Test Linkage

Darrell Bock proposed to use a variation of his bilog program (the "variant item technique") to obtain optimal linkages between NAEP and state assessments. With his help, I compared the resulting item-based linkages in two states with the results from a scale-based linkage and found that using the item-specific information did little to improve the accuracy of the linkage. (email me for a copy of the paper).

Development of Guidelines for Linking State Mathematics Assessments to the National Assessment of Educational Progress

As the initial substantive task of the Education Statistical Services Institute, I statistically linked student-level NAEP and state assessment scores for the 1996 NAEP mathematics assessment in four states. The purpose of the study was to find out how good the linkage would be. I found the accuracy of the linkage to be sufficient in three of the four states, but only for making aggregate judgments, not for individual projection of NAEP scores form state assessment scores (because the linkage error was too great). I submitted guidelines for developing, evaluating, and using the projection type of linkage I implemented, so that others could carry out similar analyses (email me for a copy of the paper).

Development of Guidelines for Linking State Reading Assessments to the National Assessment of Educational Progress

I replicated the 1996 analyses for the 1998 NAEP reading assessments, this time in six states. The results were very similar to the results for mathematics. The linkages were sufficiently accurate for aggregate reporting in five of the six states. As part of this project, I also evaluated the NAEP-TIMSS linkage that was being constructed.

Data Analysis and Support for Elementary and Secondary Education Statistics (ELSEC)

At AIR, I put together a consortium with Research Triangle Institute (RTI) and Policy Studies Associates (PAS), and took this task order contract with NCES away from the consortium headed by Pelavin Associates. The contract lasted for four years, and in that time we produced more than 30 reports and issue briefs for NCES. These included reports using the Common Core of Data (CCD) (NCES 95-300, NCES 96-399, NCES 97-529), the Schools and Staffing Survey (SASS), the Private School Survey (PSS) (NCES 95-330, NCES 97-459), the National Educational Longitudinal Study of the class of 1988 (NELS:88) (NCES 97-052), and the national libraries Survey. The AIR task leaders on this project included Bob Rossi, Victor Bandeira de Mello, Roger Levine, Tom Parrish, and Jay Chambers. Halfway through this project, Pelavin Associates joined AIR, and former Pelavin Associates staff worked on the project.

Development of a Schools and Staffing Survey Student Achievement Subfile: Pilot Test

This project arose from the combination of the initial version of the school-level state assessment score database with the availability of SASS and NAEP data for the same school year: 1993-1994. I merged the state assessment score data we had collected from 20 states on the NAEP secondary analysis grant with the SASS public schools in those states, and used the prior merge with NAEP to add a national context to the state assessment data. I carried out structural equation modeling to estimate the strength of association between a variety of school-level factors and school average mathematics and reading achievement, controlling for demographic differences between schools. The results are presented in Analytic Issues in the Assessment of Student Achievement (NCES 2000-050). Among other things, class size was found to have a stronger association with reading achievement than math achievement, at the school level.

Secondary Analyses of the 1990 Trial State Assessment

In this 1992 project, conducted jointly directed with Liz Hartka, we carried out several analyses of the properties of the NAEP "Trial State Assessment," as it was called then. First, we explored methods for adjusting NAEP achievement differences between states for demographic differences. Second, we explored the similarity of item omissions to incorrect responses, based on other responses by the same student. Third, we examined the local independence assumption, comparing probabilities of correct responses on item pairs to the products of the probabilities for the individual items. Finally, we compared performance on early versus late items in a block to estimate the distribution of speededness of NAEP.

Interactive Pedagogical Skills Assessment Design

In consortium with CTB/McGraw Hill and a technology company, I contributed to the development of a prototype interactive video-based teacher assessment tool, in which teacher candidates were placed in simulations of real-world classrooms and asked to indicate how they would proceed at various points. I conducted focus groups with leading teachers from around California, identifying key teacher skills needed for effectively teaching math, reading, arts, science, and other subjects. My responsibility also included designing the method for scoring the test. This was funded by the state of California, but although teachers acclaimed the prototype, the state did not fund large scale implementation.

Evaluation of the NAEP Trial State Assessment

When NAEP was expanded in 1990 to provided state-level reports, Congress mandated an evaluation to determine its feasibility and validity. This evaluation was carried out by a National Academy of Education blue ribbon panel, for which AIR, led by George Bohrnstedt, served as the professional staff. TSA Panel members include Robert Linn, Robert Glaser, Lorrie Shepard, Edward Haertel, Lauress Wise, Gordon Ambach, and Al Shanker, among others. From 1991 to 1996, I served as the lead analyst on the evaluations. In that role, I designed and carried out analyses and data collection efforts for evaluations of the validity of (1) the 1990, 1992, and 1994 assessments, (2) the NAEP standard setting process, and (3) the student exclusion process. The TSA Panel’s conclusion, that the standard-setting process was fundamentally flawed, was not accepted by the National Assessment Governing Board.

Evaluation of the Dropout Statistics Field Test

In 1989, the National Center for Education Statistics selected AIR to manage a field test of the collection of dropout statistics for the Common Core of Data. We worked with school district personnel in 187 districts in 30 states to develop measures that would be both feasible and valid. The difficult step was to differentiate transfers from dropouts. School districts were to count as dropouts any students whose extended absence could not otherwise be explained. Roger Levine and I followed up 456 alleged transfers and 192 assumed dropouts and found that fewer than 8% of the alleged transfers were actually dropouts, while nearly a quarter of the assumed dropouts were transfers.

Civilian Occupational Validation of ASVAB

In order to increase the attractiveness of taking the Armed Services Vocational Aptitude Battery for high school students, the Defense Manpower Development Center contracted with AIR to determine the validity of the ASVAB for predicting performance in civilian occupations. To begin this project, we selected 10 widely varying occupations, and through critical incident interviews with supervisors in each occupation, we identified performance evaluation dimensions for each and developed forms to be used as the dependent variable in the validations. The plan was to administer the ASVAB to a sample of incumbents in each occupation and correlate the scores with performance ratings.

This project was notable for a difficulty that arose. We submitted a form for OMB approval for the data collection, and it was turned down. The reason given was that AIR had done nothing wrong but that the sponsor had failed to notify OMB in advance of publishing the RFP for the study that data collection on civilians would be involved. The study was cancelled until the Department of Defense could issue a new RFP. AIR bid successfully on the new RFP (we already had the performance evaluation instruments for the study) and carried out the study four years later than originally planned. Marie Dalldorf was a key contributor to this study for AIR.

Study of the Validation of JTPA Postprogram Follow-up Data

The Department of Labor sponsored the Jobs Training Partnership Act, aimed at providing unemployed people with entry-level job skills. They would train people for specific careers, partnering with businesses who would agree to hire the graduates of the program. It was important that the jobs not be dead-end jobs but offer opportunities for career development. DoL required reports of the job status of graduates after six months and a year, and AIR’s project was to validate those reports. Among other things, I traveled to local JTPA centers in a number of states to find out about their transitioning and follow-up procedures. Jean Wolman was a key contributor on this project, developing a guidebook for JTPA follow-up reporting.

Statistical Support for School Desegregation Litigation

In the late 1980s, the U.S. Department of Education continued to sue school districts that maintained racially segregated schools. AIR was awarded a project to support lawyers for the Office of Civil Rights as they gathered evidence of school segregation. This project was to consist of a series of task orders for different cases, but due to the administration’s decision to cut back on its efforts to enforce desegregation, only a single task order was issued. We put together a database of students’ switches between schools from one grade to the next in a rural southern school district and analyzed the results to determine whether there was a pattern of racial segregation. Although the analytical work was completed, the Office of Civil Rights did not pursue the litigation.

EEOC Workshop on Analytic Methods

Expert Reports for Federal Court Title VII Employment Discrimination Litigation

Expert Reports for ____ v. ____.

During the 1980s, I developed analytical programs for a series of projects in which AIR provided consultation on employment discrimination litigation. I wrote three major computer programs, "MULTEVENTS," "CONLOG," and "WAITING." The first of these, originally suggested by Laurie Wise, provides exact probabilities of discrete outcomes of processes such as the promotion of specified numbers of focal and referent groups members (e.g., women and men) across a series of promotional opportunities with specified numbers of applicants. The second generalized the first program to allow controlling for differences in qualifications. This program is based on my innovative solution to a mathematical problem considered intractable. It also includes a method for dealing with infinite beta weights not available in standard statistical packages. The third program generalized the first program to apply to testing whether members of a focal group have to wait longer for promotions than others. It is an adaptation of the biostatistical methods of survival analysis. This program includes a solution to the problem of partial likelihood maximization, which in standard procedures assigns non-zero probabilities to logically impossible outcomes. The defining characteristic of these programs is the replacement of approximations by exact calculation of probabilities of outcomes of employment decisions.

MULTEVENTS was originally written as a SAS procedure, when AIR’s computation was on the mainframe computer at Stanford University. I rewrote that program and wrote the other programs in FORTRAN, and they presently exist as standalone executable modules for PCs. They can be called from SAS data preparation programs, for example, or invoked from the Windows command line ("Run"). These programs continue to be used for employment discrimination analysis. (Please email me if you would like to download and use these programs.)

Needs for Educational Software: Reading, Writing and Communication

In the early 1980s, there was great optimism about the potential of educational software to boost achievement levels in schools. Effective software could target each individual student’s special needs and problems. The main problem, it appeared, was that there were still too few PCs in classrooms. This project, carried out for the National Institute of Education (NIE), was a study of the state of the art in reading and writing software and an attempt to identify the most important barriers to realizing the potential for computers in the classroom. We inventoried the available software, held focus groups with leading educators and software designers in Minnesota, Massachusetts, California, and elsewhere, and prepared a summary report for NIE. Darlene Russ-Eft conducted this project with me.

Development of the National Assessment of Educational Progress Methods

In 1982, the U.S. Department of Education decided to put the National Assessment of Educational Progress (NAEP) up for bids. Until that time, the project was carried out by the Education Commission of the States (ECS), with subcontractors. NIE announced a grant competition, from which five winners would be selected, whose unstated but known purpose was to partially fund the development of full-blown proposals to conduct NAEP. I wrote the proposal that won one of the places in this scenario for AIR. However, in the fall of 1982, I became fully committed to work on the ASVAB validation and the study of needs for reading and writing software and turned over the project of actually writing the full NAEP proposal to others at AIR. In hindsight, we should have taken Bob Krug’s advice and not tried to write a proposal to beat the Educational Testing Service (ETS). ETS put far more effort than AIR did into this proposal and won the competition to conduct NAEP. As of 2005, ETS was still the prime contractor on NAEP. AIR’s chance to become centrally involved in NAEP waited until 1990, when we became responsible for the National Academy of Education’s evaluation of the "Trial State Assessment," the dramatic expansion of NAEP to support comparisons between states.