Revised Teacher Evaluation Criteria — Sometimes informally called the “state 8″, the new criteria in RCW 28A.405.100 were signed into law on March 29, 2010. The criteria form the backbone of the new evaluation system. According to the RCW, “the four-level rating system used to evaluate the certificated classroom teacher must describe performance along a continuum that indicates the extent to which the criteria have been met or exceeded.”
Criteria Definitions — Based on feedback from experts and our TPEP districts, we have created definitions for each of the new teacher criterion. The definitions are intended to delineate the criteria and minimize the overlap between the criterion, creating more consistency across the state in setting clear evaluation targets for teachers and principals. These will be defined in WAC.
Instructional Framework — The instructional framework provides a common language/model of instruction and sets a shared vision of good teaching within the district. Marzano states that teachers and principals use the instructional framework “to converse about effective teaching, give and receive feedback, collect and act upon data to monitor growth regarding the reasoned use of the strategies, and align professional development needs against the framework.”
Rubrics — Each instructional or leadership framework provides rubrics clearly defining the continuum of teaching performance, from unsatisfactory through distinguished teaching, based on the eight teacher evaluation criteria. The rubrics should be used to train principals to identify strengths and weaknesses in practice based on clearly defined evidence and measures. They also take into account the variations of novice to expert teachers.
Measures and Evidence — (Defined in draft by TPEP Pilots) The measures and evidence are used to determine the “teacher’s performance along a continuum that indicates the extent to which the criteria have been met or exceeded.” The measures used in the evaluation system should have strong correlation to the criteria being evaluated. There are four areas under the “measures and evidence” section: classroom observation, teacher self-assessment, student growth data, other measures/evidence. This section should represent the district’s system for determining final summative evaluation score.
Final Summative Evaluation — (Defined in WAC and not determined until the conclusion of TPEP Pilot) The final summative evaluation is a critical definition in order to ensure consistency across the state as teachers are evaluated and data is submitted in aggregate. In the late fall 8 of the 9 TPEP sites and WASA submitted a summative evaluation statement for each of the 4 tiers. Similar to the standards-based system for students, clear targets for both the distinct criteria and the final summative evaluation will drive principals and teachers to a evaluation system that promotes growth and prevents stagnation.
Regional Implementation Grants (RIG) — A TPEP ESD regional implementation
consortium will involve 5–10 districts, which will collaborate around the evaluation
implementation activities by agreeing to the identified assurances as a group. Each
ESD will coordinate the work of the districts within the regional consortium. OSPI along
with the rest of the TPEP steering committee determined a list of assurances each
district must agree to in applying.
eVal (Evaluation Management Tool) — A web-based system developed and designed
by the WEA, ESD 113, and OSPI with input and feedback from our TPEP Steering
Committee and pilot sites. The system will allow teachers, principals, and district
administrators to coordinate, review, schedule, view, and upload any and all applicable
evaluation materials. eVal is currently being tested and by our pilot sites before
expanding to state-wide availability.
TPEP Task Force Terms: Student Assessment
Assessment — the process of gathering information, both formally and informally, about students’ understandings and skills.
Authentic Assessment — demonstration or application of a skill or ability within a real-life context.
Criterion referenced — criterion-referenced tests measure student performance against a set of standards with determined levels (advanced, proficient, basic).
Diagnostic assessment — information collected before learning that is used to assess prior knowledge and identify misconceptions.
Evaluation — the process of making judgments about the level of students’ achievement for accountability, promotion, and certification.
Fairness — addresses the issue of possible bias or discrimination of an assessment toward any individual or group (race, gender, ethnicity).
Formative assessment — information collected during learning that is used to make instructional decisions.
Grade equivalent — uses a scale based on grade levels and months to establish students’ level of performance.
Norm referenced — norm-referenced tests compare student performance to a national population of students who served as the ‘norming’ group.
Performance assessment — students demonstrate that they can perform or demonstrate specific behaviors and abilities.
Percentile — a statistical device that shows how a student compares with students in the ‘norming’ group who had the same or lower score.
Portfolio — a collection of student work with reflections.
Reliability — the degree to which an assessment will produce dependable results consistently and over time.
Rubrics — a scoring strategy that defines criteria and describes levels of quality (Unsatisfactory, Basic, Proficient, Distinguished).
Standardized test — standardized, summative assessments designed to provide information on the performance of schools and districts.
Summative assessment — information collected after instruction that is used to summarize student performance and determine grades.
Validity — the degree to which an assessment measures what it claims to measure.
TPEP Task Force Terms: Student Growth
Expected Growth — a student’s expected/predicted performance on a current year test given his or her previous year’s test score. This is obtained by regressing the current year test score on the prior year test score. In other words, estimating expected growth addresses the question, “Compared to students with the same prior test score, is the current year test score higher or lower than would be expected?”
Growth Models — measure student achievement growth from one year to the next by tracking the same students. This type of model addresses the question “How much, on average, did students’ performance change from one grade to the next?” To permit meaningful interpretation of student growth, the model implicitly assumes the measurement scales across grades are vertically linked (i.e., that student scores on different tests across grades are directly comparable and represent a developmental continuum of knowledge and skill).
Residualized Growth — the difference between any student’s observed current year test score, and that which would be the expected score given his or her prior year test scores (i.e., expected growth) represents residual. This residual is referred to as “residualized growth,” which quantifies the extent to which students’ performance changes between the prior year and the current year is higher or lower compared to those with similar performance in prior years.
Teacher Effect — a teacher’s contribution to student performance growth compared with that of the average (or median, or otherwise defined) teacher in the district or the state. In essence, teacher effect is the difference between the observed student achievement growth and the expected student achievement growth (controlling for confounding factors, such as prior student achievement and sometimes student background factors), which are interpreted as representing differences in student achievement growth due to differences in teacher effectiveness. Note that the description of “school effect” or “principal effect” is less straightforward because it will depend on decisions about how to aggregate grade- or subject-level estimates based on the specific model employed to determine teacher effects.
Value-Added Estimate — to determine the value-added estimate, teacher effects are compared with the counterfactual (sometimes referred to as a “typical” teacher). If the teacher effect is higher than the counterfactual, then we may claim the teacher is effective (i.e., positive value-added). Conversely, if the teacher effect is lower than the counterfactual, then we may claim that the teacher is not effective (i.e., negative value-added). The number or rating produced in the comparison is the value-added estimate.
Value-Added Models (VAMs) — complex statistical models that attempt to determine how specific teachers and schools affect student achievement growth over time. This model generally uses at least two years of students’ test scores and may take into account other student- and school-level variables, such as family background, poverty, and other contextual factors. VAMs address the question, “To what extent can changes in student performance be attributed to a specific school and/or teacher compared with that of the average school or teacher?”