NAHT Education without soundbites. Here journalist Susan Young writes about the panel debate on 17th June that we organised, on the subject of “Whose Schools Are They Anyway?”
NAHT Education without soundbites. Here journalist Susan Young writes about the panel debate on 17th June that we organised, on the subject of “Whose Schools Are They Anyway?”
So far in my series of three linked series of three posts we have addressed how to tell if someone has general ability, and how to tell if they are learning anything at all. This post will comment on how schools try to work out if a pupil has potential in an area of specialism directly related to what the school offers. Once again, like assessing intelligence, this is a highly controversial area. We found this during the period 1944-1979 when the 11+ examination was sat by most children in Britain. They were tested in English, Maths and Verbal Reasoning, but whether they got a coveted place at grammar school often depended on gender (there were more places for boys in many areas, and they needed a lower mark to access grammar schools), and social class (middle class children outnumbered working class ones in grammar schools).
One of the problems with all aptitude tests is judging whether they demonstrate aptitude or in fact only test achievement. A good example might be music tests – you are considerably more likely to be able to differentiate between minute musical tones if you have been learning to do so on the violin for five years.Indeed, it is possible to pay for a course that will help you prepare for such a test:
In terms of aptitude tests, there is also perceived to be a bias in favour of the middle classes, as they have access to a broader range of extra curricular activities to support the development of ability in various specialised areas, such as learning modern foreign languages. For that reason, maintained specialist schools are now only entitled to select 10% of their secondary pupils via selection bands, and this is overseen by the Schools Adjudicator.
For interest, there are some language aptitude tests from Oxford University here:
The moral of the story is that, while aptitude tests can sometimes show us talent where it might not otherwise be spotted, just as often it shows us the background of the people being tested. The secret is knowing the difference.
My job is exciting at the best of times, but one particularly interesting thing happened this week. On Tuesday I found myself in the Institute archives looking at our collection of 250,000 past examination papers with our Special Collections Librarian, Nazlin Bhimani. We have been discussing what to do in research terms with this rather splendid collection recently, and I decided to go down to see it with a view to developing a blog post or two promoting its wares. Naturally I looked up the Oxford Local examination papers I had actually sat myself many years ago, as any sensible person faced with a wall of folders would. After some initial palpitations, Nazlin and I scrutinized some of the past papers for content and style.
It would be fair to say that we were shocked. Even by the standard of the times, the content of the examination syllabi was likely to present barriers to many female learners in the 1980s. My German A Level paper was depressingly plodding and replete with references of a recent (to senior examiners, but not to school pupils of the time) war, a war in which the role of women had more or less been erased in the translation texts chosen. It was as though Germany had only existed between about 1933 and 1950. The music paper emphasised a set of skills and a form of knowledge firmly rooted in the 19th century Western classical tradition with an occasional nod in the direction of earnest blokes playing the organ in church. The ability to memorise vast chunks of musical score and write them out at will was paramount, even though this was – and is – a skill nobody in a university music department, or a professional musician, would ever bother using, when it is simpler to pull the full score down from the (probably now digital) shelf, or more usually, just hum bits as required. In the French paper we had a few women appearing, but generally only as bit parts (Marcel Pagnol’s mum being an exception to this), or as unhinged literary oddities that their (male) authors dissected and examined for a salacious readership (in the case of Thérèse Desqueyroux, splayed across the pages for our curiosity and delectation).
Overall, I was struck by the sense of a intellectual world that was dry, static, and rooted firmly in a model of society two generations back, where women were allowed to look in from the sidelines but could never be central to public life. That was the world potential university students were being prepared for in the 1980s, and given how difficult it was for girls to navigate this, I am amazed any of us made it into higher education. Luckily, when we were there, things clearly started to open up for us. I recall reading lots of French novels with strong women at the centre, and doing an undergraduate dissertation on women in music, just to find out what on earth half the population was up to while men were writing the history books. Things were starting to change, which in examination terms manifested itself via the introduction of GCSEs and modular A Levels in 1988, and the revision of content to make it more inclusive, to name two examples.
Currently we are reforming examinations once again, and there are vociferous arguments about what children and young people need to be taught, and what constitutes rigour. One thing I know, and that is if rigour is going back to the type of syllabi forced on girls in the 1980s, it’s not something I see as a step forward.
This is the second in my series of three posts looking at assessing intelligence, learning and aptitude. In the first post in the series, we looked at the role of intelligence testing in education, which is trying to work out the inherent capability of an individual. This post will talk about how we assess an individual’s educational progress.
When choosing an assessment technique, teachers, lecturers and trainers consider a range of criteria. These might include the potential purpose – for example formative assessment (which would involve comments on work) or summative assessment (which would involve an examination and final mark of some kind). These are also often referred to as assessment for learning and assessment of learning. We might bear in mind the potential use, such as helping pupils to improve literacy or numeracy skills; internal tracking of progress; communicating amongst teaching team members; overall monitoring of internal standards; or overall monitoring of national standards. When creating assessment processes, we also have to think about the type of task. You will be familiar with lots of these, I imagine, but a short list of the most common ones might include essays; embedded tasks created by teachers and lecturers (online discussion forums, for example); presentations; projects; performances and demonstrations. In addition to all that, we need to consider the agent of judgement in relation to the inspection – in other words, whether the assessor is likely to be a teacher/lecturer, or a student/pupil in a form of peer assessment. For the assessment to be reliable, we need to pay attention to the basis for our judgements. The assessment might be norm referenced (in which percentage bands get a certain classification, as in the case of O’Level exams) or criterion referenced (assessment whereby if you meet all the criteria, you automatically get 100%, as in the case of GCSE examinations). We might even create an assessment that is student referenced (what we might call ipsative). The form of feedback or report will also play a part in our development. This may involve a mark or score; a profile of achievement against published criteria; a statement of the overall grade achieved; a comment or piece of oral feedback; or most frightening of all, rank order.
Even though a lot of thought will have gone into the design of an assessment process or practice, there are always going to be problems arising. These include:
Some methods of quality assurance include:
Ultimately it is impossible to have an education system without various forms of assessment taking place at various times. Some of these assessment systems will be used to support and encourage learning, and others will be used to triage pupils or students into different ability and achievement groups. What is important is that we are thoughtful about what we are trying to do, and why.
We are holding an interesting panel debate on 17th June 5.30-7.30pm at the IOE, Committee Room 1, that you might be interested in. Space is limited, so if you would like to attend, you should contact Lucian Stephenson at firstname.lastname@example.org.
The topic will be “Whose schools are they anyway?” and we are looking to debate state school governance. Hopefully it will be a lively evening. If it goes well, our next one will probably look at contemporary issues surrounding higher education.
Our panellists will be Fiona Millar, Local Schools Network, Anastasia de Waal, Deputy Director and Director of Family and Education, Civitas, Ros McMullen, Principal of the David Young Community Academy in Leeds, and Dame Alison Peacock, Head Teacher of the Wroxham School, Potters Bar, Hertfordshire.
Over the next few weeks, I will be posting three articles up on this blog, as a short series. This series examines an important problem in education, and that is: how do we find out what individuals are capable of, and what they are learning? I will look at three aspects of this problem. First of all, I will discuss assessing intelligence, then I’ll move onto assessing learning, and finally I will look at assessing aptitude. It’s quite ambitious to try to do this in blog posts, because these subjects fill entire library sections normally, but you really should see the posts just as a short introduction to each topic, a kind of taster.
I’m going to start with a brief investigation into IQ tests. First of all I will describe the main types of intelligence test that have been in common use over the last century, in chronological order of development. I will also discuss the uses and limitations of IQ tests of western origin.
It’s widely recognised that measuring intelligence can be regarded as controversial, as it can be culturally specific. In other words, how well you do on the test can relate to how similar you are in background to the people setting the test. As most psychologists in the early part of the 20th century were white and middle class, their tests were often based on ideas and situations that they would have found familiar, in a US or Northern European context. This disadvantaged non-native speakers, those from a challenging educational background, or indeed those from outside the US or Europe. Although there have been attempts to remedy this, by making tests as culturally neutral as possible, it is still difficult in some cases to get to the absolute core of what constitutes intelligence. Today we are going to look at some of the debates surrounding that.
Starting with the Western psychological context, we tend to measure IQ as a way of measuring intellectual potential (IQ means “intelligence quotient”) and that is a figure which represents your success in doing particular timed tests in relation to your age. This concept of the test being fixed in time, both in terms of the clock, and in terms of biological age, remains central to assessing intelligence.
The first widely used intelligence test was known as the Simon-Binet test (1905), and it was used by the French government to help identify children who would need help at school. As explained above, it assessed what ‘average’ performance would be, and it calculated a child’s ‘mental age’ according to that. By 1916 it had been extended to include adults.
The next significant development in terms of intelligence testing was the introduction of the Wechsler Adult Intelligence Scale (1939) and Wechsler Intelligence Scale for Children (WISC). This included verbal and non verbal test items, such as the manipulation of blocks, pictures, etc. It was felt that non verbal test items would help the test be more culturally neutral.
Clearly there are particular difficulties in assessing the intelligence of very young children, and to this end the Bayley Scales of Infant Development were developed in 1969 for children under two. These are used in adapted form by Health Visitors today, at children’s two-year check, for example, and a typical task might be building a tower from three cubes.
There has also been widespread use of another test, the British Ability Scales (BAS) (1979). These were designed to measure development and moral reasoning, and to be less US-centric.
A factor in the development of IQ tests is indeed that the tests have become increasingly complex over time, and the kits for psychologists are very expensive indeed – we are talking upwards from £600 now. A test does not just consist of a cheaply printed question sheet and a marking sheet. There is big money in developing tests, which are commercially produced by monopoly providers. While they are normally used properly, an important consequence of this expense is that in some cash-strapped educational or health related contexts, bits and pieces of the test get lost or worn, parts are transferred from one test to another, photocopies are made, and this all means that the test isn’t as strictly regulated as it might be, with consequences for the results. This is despite the British Psychological Society compelling testers to attend accredited courses to learn about best practice in test delivery, to ensure standardisation. Therefore real life is a factor in how well the test is given, and how accurate the results might be.
So we have considered a number of IQ test materials and practices. I’m going to move on now to examine how we judge the predictive ability of IQ tests. There are three key questions for psychologists in determining how useful a test is. First of all, is the test reliable – does it give consistent scores if repeated over time? Secondly, is the test valid – does it correlate with future academic achievement? And finally, are test scores stable – does the IQ of individuals change over time? In many cases, the answers are favourable, but as I argued above, there have been some problems with IQ tests in the past. One of these is cultural discrimination. As I explained, you are at a distinct advantage here if you happen to be a white American or European. Even within that context, the British Ability Scales tests were introduced to counter the US-Centrism of tests such as the Wechsler Intelligence Scale for Children. There has been particular criticism of the use of tests with indigenous native populations, for example Australian aboriginals, as low scores were used as grounds for persistent discrimination, yet subsequently it was discovered that the tests being used did not measure intelligence sufficiently well for these groups.
Another factor in test bias was that they risked discriminating according to the educational background of test taker. In other words, if you had attended school and could read well, and had learnt how to tackle abstract problems in a systematic manner, then you were likely to be at a distinct advantage to those who had not benefited from such experiences. The environment when people are taking the tests can also play a role. Tests are designed to be done under laboratory conditions, and if there is noise or interruptions, this can reduce the overall score. Similarly the mood and motivation of test takers plays a role. I am sure that every health visitor in the land can recall a two-year-old who has refused to co-operate with testing at his or her two-year-check. (One of my own children did this, and consequently spent the entire appointment carrying out a comparison of all the different weighing scales in the room to see if they came up with the same reading. So much more interesting than building a little tunnel out of three blocks, and pushing a pencil under it!) Another phenomenon in all kinds of psychological testing is the desire to please an authority figure and give expected answers, by second guessing what they might want. This may result in the person being tested giving an erroneous answer unnecessarily. Finally, we must bear in mind the effect of coaching.
W H Smith is absolutely full of books containing IQ tests for middle class parents to buy for their children, which teach explicitly the techniques needed for success in analysing problems. Again, this is bound to affect results and the overall reliability and validity of tests. The availability of test coaching materials may be one factor in why we seem to see IQ results rise over time. This is not because of some evolutionary change making us all cleverer – it is much too quick for that. It is because we are all getting better at doing the tests. That is why test manufacturers have to keep regrading them, so the average mark is based on increased numbers of correct answers.
Many tests work on the assumption that if you are intelligent in one respect, this is likely to apply in different cognitive domains. For example, British psychologist Charles Spearman (1863-1945) described a concept he referred to as ‘general intelligence’ or the g factor. After using a technique known as factor analysis to examine a number of mental aptitude tests, Spearman concluded that scores on these tests were remarkably similar. People who performed well on one cognitive test tended to perform well on other tests, while those who scored badly on one test tended to score badly on others. He concluded that intelligence is general cognitive ability that could be measured and numerically expressed.
However individuals can display varying degrees of intelligence depending on what they are trying to do. For example, I may be able to write this blog post, for example, but try to get me to navigate around IKEA to find a remote flatpack successfully, and you will see someone who needs serious help.
In evolutionary studies, the debate is whether g evolved as a multipurpose tool or whether the mind has domains, like a ‘Swiss army knife’, with cognitive ‘tools’ evolving to answer specific challenges. This does not map exactly onto other theories of multiple intelligence, as we will see in a minute, but it does tend to overlap. This is all within the ongoing dog-fight in cognitive psychology: is the mind domain-general in function or not? Is my inability to cope with IKEA’s store layout a function of the quality of my whole brain, or just a particular part of it that isn’t quite up to the job?
When I am not worrying about flatpack retrieval, another area of 40-something personal concern is how far I am turning into my own parents. Clues as to the likelihood of this can be found in intelligence research on the heritability of IQ or other test score. This is of course a minefield – medium to high heritabilities (what is called ‘h’ to academics in this field) are generally found from twin and adoption studies. G heritability appears to increase over the lifespan – 20% in infancy, 40% in childhood and 60% in adulthood (for more information on this, see Plomin et al, 2003, “Behavioural Genetics” in The Postgenomic Era, APA Washington DC). What is more interesting than heritability, is non-shared and shared environment influence on g – this is hard to interpret as it becomes increasingly clear that there is hardly any such thing as a ‘shared environment’, even for siblings. Of course measurement error is also going to fall into this pot. As a result of all these doubts and problems with the theory, the heritability of g is no longer the focus of research. Instead, scientists are more interested in how environmental factors interact with this, as it becomes clear that there is a lot of gene-environment interaction (as in psychology generally) with the response to environmental factors depending on genetic predisposition. Shared environment appears to have less influence after adolescence as h increases (so we really do turn into our mothers!)
There are a number of alternative views of intelligence in addition to Spearman, as I mentioned previously. Perhaps the most commonly recognised view is that of Howard Gardner (1983) who argued that there was such a thing as multiple intelligences. The theory was first laid out in Gardner’s 1983 book, Frames of Mind: The Theory of Multiple Intelligences, and has been further refined in subsequent years. They are as follows:
In a sense, this idea of multiple intelligences rather resembles Thurstone’s multifactor theory with seven primary mental abilities (in the manner of personality traits), which dates from 1938, so it wasn’t entirely new. However Gardner argued that intelligence defined in the manner I have described fails to take into account comprehensively all the different aspects of ability that are present in humans. For example, a child who is able to memorise multiplication tables easily (in the manner approved of by the Victorians) is not necessarily going to be more intelligent than one who struggles. The second child may be stronger in another kind of intelligence, and may indeed have potentially higher mathematical intelligence than the one who simply memorizes tables. This suggests that schools should take pains to identify different strengths and weaknesses amongst individual pupils, and tailor the curriculum on an individual basis accordingly, using a range of approaches to teaching.
Gardner’s criteria for determining a specific area of intelligence were firstly case studies of individuals exhibiting unusual talents in a given field, such as child prodigies or autistic savants); neurological evidence of specialised areas of the brain (often including studies of people who have suffered brain damage affecting a specific capacity); the evolutionary relevance of the various capacities; psychometric or studies; and the existence of a symbolic notation (e.g. written language, musical notation, choreography). His theories have been heavily criticised on the grounds that there is little empirical evidence for their existence, and also because they pertain more closely to personality types than any latent ‘intelligence’. Psychologists have also argued that in all the other intelligence tests, the different areas of intelligence have more or less correlated, which suggests that it is unlikely someone could be outstanding in one are and not in any others. Despite such criticism, the theory has been widely adopted by teachers, in the same way that schools have been very quick to catch onto the idea of the existence of aural, visual and kinaesthetic learning styles, even though much of the time many of such concepts are without empirical foundation and not properly understood. They are, however, cheap, quick and easy to implement, and reinforce teachers’ self-identity as socially equitable educators.
Moving on, the latest development in terms of trying to understand and classify intelligence is most probably Sternberg’s triarchic theory of intelligence (1985). This priorities the following aspects of the human condition:
It moves away from the idea of psychometric testing of individuals, and towards a more cognitive approach. It was based on his observations of graduate students. Again, this system of measuring intelligence has been criticised because many of the systems inherent in the criteria previously listed are merely new versions of cognitive skills tracked using existing tests, which are thought to correlate well to personal and professional success in middle age and beyond, suggesting validity.
I’ll finish this blog post with an interesting quotation about the role of IQ in creating scientific success. I hope this disabuses you of any notion that your aspirations should be limited by any idea that you need a special level of IQ to achieve anything.
‘Even within science, IQ is only weakly related to achievement among people who are smart enough to become scientists. Research has shown, for example, that a scientist who has an IQ of 130 is just as likely to win a Nobel Prize as a scientist whose IQ is 180.’
Hudson, L (1966) Contrary Imaginations: A Psychological Study of the English Schoolboy (London, Methuen), p104, cited in Sulloway,F J (1996) Born to rebel: birth order, family dynamics and creative lives New York: Pantheon, p357