At some point very soon, DPS will issue its 2013 School Performance Framework (SPF). The SPF performs a valuable function by aggregating multiple metrics on school performance and presenting the data so that both parents can understand it at a high level and with enough depth that educators can drill down into specific areas. Performance mechanisms are often difficult, and the positive news is that we are now having conversations in Denver about how to make good things like the SPF better (instead of questioning their basic premise). But it is also time to give the SPF a hard look and work to improve its credibility and usefulness.
Here are my top suggestions for improving the SPF. I hope others will weigh in as well.
1. Raise the Proficiency Bar: The low levels at which proficiency is rewarded on the SPF is shocking. Examine the 2012 SPF rubrics (Under “2. Student Achievement Level”).
High schools get maximum points for having 20% of their students at proficiency in math. That’s not a misprint – if one in five kids is doing math at grade level, that school receives all possible points in the category. Other subjects are not much better: maximum points are awarded for 30% proficiency in science, 40% in writing, and 50% in reading. Middle schools are similar: maximum points for proficiency levels of 30% (science), 40% (math and writing), and 50% (reading). Elementary schools are 30% (science), 40% (writing), and 50% (math and reading). Not one level or subject has a proficiency goal above 50%.
Performance goals are a signal about what is important to a school or district. Personally, I greatly prefer exit-level proficiency (instead of school-wide) as a metric, provided there is some adjustment for student demographics (such as the SPF does with its similar school analysis). But currently on the SPF, there is no incentive or reward, in any subject and at any level, for a school to raise its proficiency level above 50%. That is indefensible.
2. Detail selective admissions. A number of DPS schools have selective admission policies, where students apply based on specific skills, talents or potential. Many of these processes are comparable to college admissions: requiring test scores and letters of recommendation. DPS should publish the percentage of students at each school that are admitted through a selective process (much as they now report on the percentage of FRL, ELL and minority students). Some school are entirely selective admissions, others are partially selective through specific programs, and others are cleanly divided (at one school, high school is a magnet program but middle school is not).
There has not been nearly enough acknowledgement of the disproportionate demographics of schools that select their students (see the selective admissions graph near the bottom here), nor on the obvious impact on school performance. DPS needs to be far more open and explicit about this difference in different school populations.
3. Compare schools at specific grade levels. Currently the SPF compares all schools on a single scale, regardless of what grades they serve. By doing so, they are not measuring apples to apples. The SPF should explicitly compare schools at each level: high, middle, and elementary (as CDE does), as well as disaggregating programs that span multiple levels, such as K-8 and 6-12 programs. This presentation would also be easier for parents comparing options, who want to see their choices side-by-side. Here are some reasons why this change is important.
Schools levels are compared on different metrics: High schools are measured under criteria including college readiness metrics (ACT, AP, etc). This makes sense. But elementary and middle schools have no similar category (nor should they). One should not really be comparing two groups that are measured differently: it’s a little like ranking baseball and football players on the same index: both are athletes who require speed and strength, but the specific metrics for success are different.
School levels have different margins for error: Most of the academic criteria — and specifically growth metrics, which are the single largest component in the SPF — are based on TCAP tests. But different school levels have widely varying percentages of students taking these tests. Elementary schools offer 6 grades (K-5), in which academic growth data is available only for 4th and 5th graders. Assuming every grade has an equal number of students, an elementary school has growth scores for just 2 of 6 grades (or just 33% of students). The SPF judges elementary schools based on a pretty small sample of their students.
Contrast this with middle schools, in which all grades take the TCAP; or high schools, in which 50% of grades take the TCAP. For an Elementary school of 300 kids, if your sample size is 100 (2 grades), there is a +/- 8 point swing for a 95% confidence interval — so the margin of error on a growth score of 50% could be anywhere from 42% to 58% – which spans 3 SPF categories. That’s significant enough to warrant segregating levels.
School levels provide clarity to averages: With K-8 and 6-12 schools, an average across so many grades can hide remarkable differences. There are schools in Denver who are (for example) excellent in elementary school and lousy in middle school. That’s important to know, and an average across multiple levels obscures performance. Some schools have chosen to report individual levels differently; DPS should move to a consistent framework where all schools do.
4. Count Students, not Schools. The SPF treats a school with 2,000 students the same as a school with 200 students. That makes sense when looking at individual schools, but in the aggregate it’s silly. In 2013 the 10 highest-ranked schools had 2,934 kids enrolled. The 10 lowest-ranked schools had 4,568.
This problem is compounded as new schools often rank high in their first year or two before regressing closer to the average. In 2012, of the 15 schools that placed in the highest level of “Distinguished” there were three new schools with combined enrollment of 381 students. Two more distinguished schools were in their second year with total enrollment of 462 students. Add them all together and these five schools — one-third of all distinguished schools — had enrollment of 843 kids. The positive performance of all five of these distinguished schools can be eclipsed by the underachievement of just one: Montbello high school, ranked 8th from the bottom, with enrollment of 1,067 students. When calculating system-wide improvement or decline, DPS needs to count total students, not schools.
There are several other things DPS should probably do – for example, the cut points for some categories are absurdly broad (a range of 50% to 80% of potential points in the category of “meet expectations”); the use of cut points instead of a linear scale is really silly; re-enrollment data should be published as a specific percentage for each school; and I remain perplexed by the tactic of clustering schools based on similar MGP scores when individual MGP scores are already adjusted for the academic trajectory of similar students. Some of these are simple; some are complex. There are probably more that should be added to the list.
Unfortunately the incentive to change the SPF is probably pretty low. One of the reasons that even these major changes are unlikely is that they would illuminate some real deficiencies. Using some reasonable proficiency targets is likely to bring overall scores down. Delineating which schools choose select their students will reveal programatic inequality. Segmenting schools by level will display some considerable gaps in quality (particularly in traditional high schools). If you count students instead of schools, performance improvements are likely to be far smaller (or nonexistent).
But I also believe it is past time to face these issues honestly and directly. Evaluating schools based on multiple metrics is a basic tenet of public sector responsibility. The ability to capture and report on relevant data has increased remarkably in the past few years, and DPS is to be commended for instigating a serious and nuanced performance framework. It is now time to make it better.