Executive Summary

Since 2007, New York City has been the proving ground for a grand experiment in school governance. That's the year Chancellor Joel Klein replaced a tightly controlled, top-down administrative structure with one that gave school principals new powers to shape the culture and practice of their own schools.

The chancellor's "Children First" reform is designed to free principals from day-to-day supervision by local district superintendents and instead allow them latitude in matters such as hiring, curriculum and budget. In exchange, principals must demonstrate steady and marked improvement in student performance as measured by statistical data, such as standardized tests and graduation rates.

In this way, the new governance structure simultaneously centralizes authority over what is to be achieved, and decentralizes responsibility for how to achieve it. Colloquially, the Children First management strategy is known among principals and others as "empowerment." It rests on an elaborate accountability system designed to focus attention on gains made by the weakest students in each school while also accounting for the demographic differences among schools.

This report examines the impact of these reforms from 2007 to today, especially in the city's high-poverty communities. Over the course of a year, the Center for New York City Affairs interviewed hundreds of principals and school administrators, visited several dozen schools (with a special focus on District 7 in the South Bronx) and analyzed volumes of statistics on school performance.

The story is mixed. The system by which principals and schools are held accountable has serious flaws, which are outlined in detail in this report and summarized below. Yet the overarching concept of localized principal empowerment appears to be producing positive results. We have concluded that the system by which principals and schools are held accountable could well be substantially improved if the measures of school success or failure were more diverse, rooted in a greater variety of information and not overly reliant on scores from state standardized tests which were never designed for the purpose to which they are now applied.

The empowerment structure has allowed some effective principals to turn around failing schools and to create new schools from scratch, to forge their own vision and assemble their own faculty without bureaucratic interference. Overall, we found that the schools of District 7 in the South Bronx, one of the city's poorest neighborhoods, have improved significantly since Klein became schools chancellor in 2002. (See "Measuring Progress in the South Bronx")

The system by which principals and schools are held accountable has serious flaws.

We found that the chancellor's data-based accountability system has forced principals to pay attention to student achievement in schools plagued for decades by a culture of low expectations and poor academic performance. Klein has focused his reforms on schools serving the city's poorest neighborhoods. (See map below) His approach has appropriately identified dozens of failing schools that have since been closed or reorganized. In many instances, the schools that replaced the failing schools are better, according to the Center's analysis.

Most of Chancellor Klein's school restructing work has been in high-poverty neighborhoods

The accountability system has accurately identified high schools that graduate struggling students at higher-than-expected rates. Some of these schools may offer only a bare-bones curriculum, but students are undeniably better off with a diploma, even one that reflects minimum standards, than they would be as dropouts.

The system has also effectively identified otherwise well-regarded high schools that appear to have a sink-or-swim approach to struggling students. Schools where large numbers of students fail their classes may not be giving them the support they need. (See "What Makes an 'A' School?")

Along with the new accountability reforms, the city has experimented with different methods of providing support to principals as they seek to achieve progress and proficiency for more students. We found that the latest version of this support infrastructure, known as Children First Networks, allows principals to share ideas with colleagues in other parts of the city, rather than being bound by geography as they were under the districts.

At the same time, we found the decision to abandon geographically based districts and to free principals from the day-to-day supervision of a superintendent has substantial costs. Some principals, particularly new and inexperienced ones, are floundering without adequate direction and support. Schools in the same neighborhoods typically have no connection to one another and therefore no way of learning from one another. Parents and other community members no longer have a formal role in decision-making; parent leaders complain about being left in the dark about important decisions regarding their schools and their neighborhoods.

Perhaps most significantly, we found that the city's accountability system—which gives each public school a grade from "A" to "F" on an annual Progress Report and helps determine whether principals receive bonuses or are removed from their posts—is deeply flawed. Designed to provide parents and the general public with a clear snapshot of school quality, the "A"-to-"F" grading system has proven to be confusing and misleading. The Center found that in some cases it rewards mediocrity and fails to recognize gains made by schools that are striving for excellence.

While the city's accountability system has appropriately focused attention on how schools serve their lowest achieving students, the year-to-year volatility of the Progress Reports has undermined its credibility. In addition to receiving a letter grade, schools are given a percentile ranking. The Center discovered that schools may go from the very bottom of the city's rankings to the very top—and vice versa—in just one year. The Center found that more than half the city's elementary schools and 43 percent of its middle schools had swings totaling more than 50 percentage points in their rankings over a three-year period. (See charts below.) The Department of Education (DOE) acknowledges this problem and has taken steps to address it in 2010. (See "Building a Better Yardstick")

NYC progress report grades and percentile rankings for district 7 elementary schools: SY 2006-07 to SY 2008-2009

Click charts to enlarge.

Comparison of school ranking volatility by school type

Click charts to enlarge.

Problems are most glaring in the elementary and middle school Progress Reports, which are based almost exclusively on the results of state reading and math tests. While the state designed the reading and math tests to measure "proficiency" (that is, how many students achieved state learning standards for their grade), the city uses them to measure "growth" (that is, how much progress students made each year). For technical reasons, using a test for a purpose other than the one for which it was designed leads to unreliable results, according to city, state and independent testing experts. (See "What's Wrong With Using State Tests")

Data-based accountability has helped to reverse a culture of low expectations.

Even more significant is the fact that the tests cover only a small portion of what the state says children should learn. For example, the state learning standards for English Language Arts say children should learn to use a library, select appropriate books, speak clearly, express opinions, and write and revise their work using multiple sources of information. Examples of meeting these standards include delivering a campaign speech, writing a letter to the editor, reciting a poem, performing a dramatic reading or writing a research paper using interviews, databases, magazines and science texts.

These are the skills, many educators say, that prepare children for high school and college. Yet none of these skills are measured by the state's elementary and middle school tests. Under the city's current accountability system, a school that focuses exclusively on boosting performance on standardized tests and ignores all the other voluminous state standards—for English and math as well as music, art, science, social studies and physical education—may receive the same grade on the city's Progress Reports as a school that works diligently to meet all the state standards.

The high school Progress Reports are less volatile because they depend on more sources of data, including graduation rates, the rate at which students pass Regents exams and the proportion of students who pass their classes each year. However, here, too, there are significant issues: A school in which students meet the bare minimum requirements may receive the same Progress Report grade as a school that offers a rich, broad curriculum that better prepares students for college, the Center has found. (See "A Tale of Two High Schools")

Recognizing that schools serving lots of poor children face extra challenges, the DOE compares each school to others with similar demographics using what is called a "peer index." For elementary schools, this a score from 1 to 100 that weighs such statistics as the number of students who qualify for free lunch and the number who receive special education services. For middle and high schools, this is a number from 1 to 4 that represents an average of the proficiency levels on state tests of entering students. The Center found that slight variations in the peer index can lead to large variations in a school's Progress Report score, particularly in elementary schools. Elementary school principals complain that the peer index doesn't account for the number of homeless children a school has, for example, or the number of children who begin a school in the middle of the year. It does take into account the number of children with disabilities, but can favor schools that make inappropriate referrals to special education. High school principals complain that their peer index doesn't take into account the students who arrive without schools records, such as those coming from a foreign country.

The notion of holding schools accountable for students' progress is a good one. The city's attempt to measure gains—and not just overall proficiency levels—is worthwhile. Because schools in wealthy neighborhoods tend to have higher-performing students than those in poor ones, it is important to evaluate schools on the gains their pupils make, rather than simply on the performance levels they achieve. But this is easier said than done.

The high school Progress Reports are less volatile because they depend on more sources of data.

Both the city DOE and the state, which administers the reading and math tests given in grades three to eight, acknowledge these difficulties. Because state tests are designed to measure proficiency, most of the questions are designed to distinguish a student at "Level 2" (below grade level) from one at "Level 3" (at grade level). This means there are few questions on each test geared for a child at "Level 1" (far below grade level) or at "Level 4" (exceeds grade level standards). At these levels, a lucky guess or one wrong answer can lead to a score going sharply up or down. The problem is not that the tests are bad; in fact, they provide a good indication of whether a child can understand a short reading passage or complete basic math problems. But these tests are being asked to do something they weren't designed to doâ€”judge year-to-year progress. The Progress Reports therefore may overestimate the gains made by some schools, and underestimate the gains of others.

Creating new tests is expensive, complicated and time-consuming. The state plans to revise the tests for grades three to eight beginning in 2011. In the meantime, the DOE is taking a number of steps to improve its accountability system using existing data.

Officials acknowledge that the formula they used from 2007 to 2009 shortchanged schools that serve higher-achieving children and led to extraordinary volatility in the elementary and middle school Progress Reports. For 2010, the DOE is changing the formula in an attempt to give more credit to schools that make gains with higher-achieving children. (See "Building a Better Yardstick") The city may also add to the formula the course grades that teachers give to middle school students and, eventually, grades given to elementary school pupils.

More promising is the department's attempt to improve the qualitative portion of its accountability system, called the Quality Review. Established in 2007, the Quality Review consists of a one- to three day school visit by a superintendent or DOE consultant. The Quality Review is designed to supplement data from the Progress Report with qualitative data drawn from visits to classrooms and interviews with teachers, administrators and even students.

The rubric for the Quality Review has changed each year. Principals complain that it is a moving target and that the quality, experience and biases of the reviewers are variable and unpredictable.

Yet the Quality Review has the potential to be a very effective tool. In 2010, under the direction of Shael Polakow-Suranksy, the DOE's deputy chancellor for accountability, the Quality Review has been revamped to emphasize fundamental elements such as a school's curriculum, culture and atmosphere. Schools are graded on measures such as safety, the level of engagement of students, the coherence of the curriculum, and the staff's ability to work as a team. The methodology has been tightened up and, the department has invested significant time and money training reviewers so their reports will be more consistent, Polakow-Suransky says. It will take time for the Quality Reviews to reach their potential, but they may well be able to capture some of the many very important features of a school that can't be quantified or measured by standardized test scores.

The city has put more weight on the standardized tests than they were designed to bear.

Statistics have their place. The state tests are useful for measuring limited but important skills in reading and math. The DOE insists the state tests are useful in predicting which children will graduate from high school and which will drop out. Only 10 percent of eighth graders scoring a low "Level 2" will graduate, compared with 90 percent of those scoring a "Level 4". Because of federal and state mandates, the city could not abandon the use of standardized tests, even if it wanted to.

However, the DOE's current method of measuring progress may undermine public confidence in the department's assertion that schools are improving overall. While the gains may not be as dramatic as officials claim, there is significant external evidence that the city's schools are improving, at least in the elementary grades. The National Assessment of Education Progress, considered the gold standard of testing, has shown slow but steady gains in fourth grade reading and math for New York City students between 2002 and 2009.

NYC students gained ground on state and federal tests over the Klein years, though federal tests show far fewer students performing at grade level

Unfortunately, the city has put more weight on the standardized tests than they were designed to bear. If schools with lackluster teaching and inattentive children are ranked above schools in which the sophisticated level of children's work is apparent from stepping inside a classroom or scanning a bulletin board, the DOE runs the risk of rewarding mediocrity and punishing excellence. Statistics cannot replace human judgment. The city must recognize the limitations of its Progress Reports, and rely instead on a greater range of qualitative and quantitative measures to gauge how well schools are educating their pupils.