I’ll begin this post by writing about how assessment works in higher education, and then explain how this shows us potential shortcomings in the OFSTED system of grading schools, with reference to one Lead Inspector and a sample of 50 inspections.
In universities, we do a lot of high-stakes marking, so we have to be very careful about our processes to make sure they are transparent and defensible at all times. There are different ways of approaching this, for example second marking (when a colleague marks papers to check the original marks make sense), blind second marking (where said colleague has no idea what the original marks might have been) and even third marking (where the first two colleagues are in dispute, and another view is felt to be appropriate). These approaches generally apply to essay-based answers, projects, dissertations and reports. Marking is double checked by an external examiner, to ensure conformity with national standards with comparable courses, and any anomalies investigated.
For certain subjects, that might involve more scientific or mathematical answers involving technical or numerical content, there are techniques such as scaling that are typically used. This means that if a cohort of, say, 200 students all score uncharacteristically badly on a question, the pass mark for that question can be raised or lowered if it was felt that the original question was pitched wrongly in the context of the overall examination, as well as what is usually expected at a particular level. That way students get a fair and reasonable result, and standards do not fluctuate wildly if there happens to have been a change in the assessment team, for example. Again, marking is checked by an external examiner, and any scaling has to be justified to him/her in the context of national standards.
If an examination board really wants to probe assessment standards, it is possible to track how individual colleagues or groups of colleagues assess over time, in terms of a particular set of criteria, or statistical norm, depending on the subject under consideration and the group size. This then feeds into ongoing staff training. Academics tend to see assessment standards as an ongoing work in progress, routinely checked and altered, and underpinned by principles of fairness and parity. People take it very, very seriously indeed. This is why I am not being my usual jokey self in this blog post.
With this in mind, recently I spent some time looking at how OFSTED inspection grades vary amongst inspectors, and which factors influence this. To that end, I would like to present a case study of an individual inspector to give an example of quite how variable grades can be in comparison with a national norm. I am not saying this is the case for everyone, or making a pronouncement about OFSTED in general. I am just saying that, in this case, there is a case for OFSTED/Serco/Tribal to moderate grades internally, as part of ensuring professional standards are met. Then, and only then, can the public have real confidence in inspection findings.
- I carried out an internet search to locate all OFSTED officially published inspection reports with the same Lead Inspector (n=50), who has inspected primary schools for two different subcontracting agencies, but who does not appear to have been an HMI (a centrally employed inspector).
- I have gone to considerable lengths to find as complete a data set as possible, by digitally searching published OFSTED reports, with triangulation against relevant newspaper articles reporting school inspections. I was also allowed access to the Watchstead website to check the data, which was very useful (thank you, Watchstead).
- I logged the overall inspection grade given by this Lead Inspector in each case.
- I calculated the overall proportion of inspection grades in each category given by this Lead Inspector, in percentage terms.
- I compared this percentage to the officially published OFSTED average grades overall for all inspectors in each category, which were available on the OFSTED website.
- Some caution is required in interpreting the data, as a sample of 50 inspections means that one or two unusual incidences may skew the findings more than it should, more so than if we had a sample of, say, 100 inspections.
- The table below lists the 50 inspections carried out by the Lead Inspector over the last 9 years, and overall inspection grades awarded in each case. I have removed the names of the schools as it identifies the Lead Inspector concerned very easily and that is not the point of the exercise here.
- The figure below demonstrates the pattern of the Lead Inspector’s inspection grades over time. It tells us that in recent years, the inspector has become considerably more likely to award Level 4 grades to schools. This corresponds to a decreased frequency of inspections carried out by this Lead Inspector during the period 2010-2014, when the new regulations applied. Click the icon in the bottom right hand corner if you want to enlarge the chart/table.
- As stated above, the inspection regulations changed after 2010, but the overall OFSTED proportions of schools getting level 2 or 4 has roughly stayed the same during that period.
- In the case of this Lead Inspector, the table below represents the proportion of grades given during the period 2005-2013. The second column represents the OFSTED average for the same period. I have listed the 2010-2014 OFSTED averages in column 3, but I have not done so for the Lead Inspector as we only have data for fifty school inspections, so that seems unhelpful. Note: columns will not add up to 100% due to complex rounding.
Inspector OFSTED 05-14 OFSTED 10-14
Level 1 10% 13% 10%
Level 2 30% 50% 50%
Level 3 40% 34% 36%
Level 4 20% 7% 6%
- It would fair therefore to conclude, with the caveat that this is a relatively small sample of 50 inspections, that on the basis of the publicly available data, this Lead Inspector appears to be around three times more likely to give a Level 4 grade to a school than the overall OFSTED average.
The problem with this is that we don’t know:
1. If this inspector is being specifically sent to schools in trouble, hence the lower grades. However it is usually directly-employed HMI that are sent to schools in trouble, as I understand it, rather than a sub-contracted inspector, as in this case (I am sure someone will correct me if I am wrong).
2. If this inspector has become more or less reliable in terms of judgements over time, compared to the OFSTED guidelines and the opinions of inspection peers (I found many incidences where this inspector was working alone, in small primary schools).
3. How inspection grades are defended internally by inspectors to one another. And if we don’t know this, then we have no idea how accountable inspectors are for their decisions.
This is why OFSTED needs to tell us more about how its moderation processes work, or if it has none, then simply to implement some as soon as possible. Otherwise wild vacillations and inconsistencies will continue to make parents, teachers and pupils very nervous indeed. If surgeons can publish their personal outcomes, then surely so can inspectors?