Dear OFSTED, now about this Data Dashboard malarky …

ID-10088180Hello OFSTED,

Perhaps you can help me with a knotty little problem I am having this morning? I am trying to look at the comparative results of some local primary schools, over time. First of all, I wanted to see how they have been doing over the last ten years or so. Now for the period 2004-2010, this has been comparatively straightforward, as I can look up the English, Maths, and Science Key Stage 2 SATS results in each case, where they exist. If I ferret about on the BBC News website, I can easily find simple and easily readable data to tell me what sorts of pupils attend these schools derived from OFSTED’s own data. It’s then a simple matter to plot this onto a graph so I can map trends over time. (That’s not to say that I think SATS have ever been anything other than a blunt instrument in terms of assessing learning, but for the purposes of statistical comparisons, we have a fairly straightforward methodology there). Now it’s the period 2011-2014 that has started causing me all the problems. We appear to have had what sociologists may provocatively call a ‘rupture’, as though the data’s entrails have emerged in a disorderly fashion. I am sitting here looking at the Data Dashboard, and I am seeing the national picture, but not the regional one, which is the first step in the data being decontextualised for me. I can get the regional data, but I have to know where to dig for it. The Data Dashboard tells me that the school is ‘compared to the national picture’ on the basis of Grammar/Punctuation/Spelling, Reading, Writing and Mathematics. My eye is then drawn to the series of little boxes labelled ‘quintiles’. Now these are particularly baffling, as these quintiles are based on what is called ‘similar schools’. If I click on the list of schools associated with one of these primaries, in each case there is a massive list of institutions that vary significantly according to characteristics like these, all of which may impact on school processes and outcomes, especially when they are combined:

  • School size (the larger the school, the more valid the sample)
  • Number of pupils eligible for Free School Meals/Pupil Premium/Ever 6 (social deprivation is linked to pupil under-attainment in certain circumstances, making it difficult to measure the exact impact of an individual school, especially when deprivation is linked to pupil mobility, as it often is)
  • Pupil mobility (unless we know how long a pupil has been in a school, we cannot assess the impact of the school – we may be assessing the impact of the previous school or even an education system in a different country altogether)
  • English as an Additional Language (once again, unless we have an idea about children’s prior level of English, as well as how long they have been in a particular school, we cannot usefully determine the impact a particular school has had on their reading, writing, spelling, punctuation and grammar in English).
  • Prior attainment of pupils prior to joining the school (see my points about deprivation and pupil mobility, above)
  • Number of pupils with special education needs (in the present system, this is often defined in terms of chronology, i.e. development being behind peers, so we need a nuanced method to establish school impact once again, otherwise smaller schools with many children who have developmental delays will look as though they are underperforming compared to larger schools with those with few children who happen to be developmentally delayed).

Having established that a different definition of ‘context’ is being used to determine ‘similar schools’, I looked up OFSTED’s official documentation in order to establish exactly how these similarities were calculated. I looked here:

School data dashboard guidance

and I also looked at the technical guidance:

However I have some questions about the way you are calculating these ‘similarities’.

1. When you say ‘average’, do you mean the mode (most common outcome), median (the mid point of the pupil results range) or the arithmetical mean (adding up all the results and dividing them by the number of results)? These may tell us starkly different things about the way the Reading, Writing, Maths and Spelling/Punctuation/Grammar tests are formulated, and what type of results pupils in particular schools attain. To be honest, OFSTED, I am not even sure if you take your calculations down to pupil level or whether you just take the arithmetical mean for the whole cohort, and compare it to a crude arithmetical mean for the whole country (which would be fairly meaningless statistically and educationally, so I hope that’s not your approach). Which brings me to my next point.

2. Do you remove outliers from your calculations? Clearly the results of smaller schools are likely to vary more annually, and pupil mobility and the development of local housing will be significant factors here. If you can’t remove isolated results at the extremities, you are not really getting a true picture of the impact of a school’s teaching on a cohort.

3. Finally, is it true that what you are doing here is taking past performance based on some sort of average, and then projecting it forwards on the assumption that this is a stable measure (sometimes called a ‘predict and control’ model)? And then linking up the ‘averages’ (however these are calculated, see questions 1 and 2 above) to create these groups you term ‘similar schools’? This is certainly what you seem to be saying in your guidance. If so, that makes me a bit worried.

If you look in my book ‘Teachers Under Siege’ , on pages 77-78, you will see why blindly modelling forwards like this is a bad idea. I give the example of UK birth rates between 1951 and 2001. If we plotted these on a graph, we would see a particular trend over time during this period, which is down. However if we went back in time to 1963/64, we would see an ongoing and quite dramatic increase that we might assume would continue indefinitely, possibly eventually resulting in 1 million births a year. With the benefit of hindsight we know that the birth rate actually started to fall as dramatically as it initially rose, resulting in falling school rolls and school closures later on.  (Another example people often use as to why the blind modelling on limited variables fails is the oil crisis of the 1970s. Many oil companies assumed continues exponential growth and ordered new tankers and plant accordingly. Shell, on the other hand, asked itself the question ,”What do we do if it doesn’t continue to grow?” and positioned themselves more intelligently within the market. They got to eat all the proverbial pies while other companies were left with oil tankers and plant they couldn’t use). Now clearly school and pupil attainment are a different kettle of fish. First of all, it is very difficult to quantify the impact of schooling precisely, particularly amongst 11 year olds of varied backgrounds. This is why I said earlier that SATS were something of a blunt instrument. We are not counting the numbers of births or barrels of oil here. Also drops in birth rate are not an indication of failure amongst the childbearing population, just as politically-driven drops in oil production and distribution in 1973 did not automatically mean that oil executives had failed. However even if we take test results at face value, the modelling you are using still looks odd. Why would it be helpful to group schools together on the basis of their ‘average’ results without taking into account any other variables? If you really think this is worthwhile, OFSTED, then you need to make your methodology and justification a lot clearer than they are here, I would suggest. Otherwise it is difficult for us to have confidence in your processes and outcomes.

Now OFSTED, I want you to feel free to comment below on this. Many of us are genuinely perplexed by the Data Dashboard and would welcome clarification.

With best wishes,

Dr Leaton Gray

[Image Courtesy of Stuart Miles, Free Digital Photos]


2 thoughts on “Dear OFSTED, now about this Data Dashboard malarky …”

  1. These blog posts make excellent points, and I think the most chilling one is that we need to be aware OFSTED may or may not be using them to regrade schools. A crude >5% swing is really no justification for plummeting categories, particularly where they may be quite legitimate reasons (sample size being the most obvious one). At a time where there is a debate about moving to data-led inspections rather than observations, we should all ensure that the statistical data is valid and reliable, and not fallacious, as the Data Dashboard version seems to be.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s