Remembering stuff

Someone once said that the educational debate in the UK is lightyears ahead of the debate internationally. It is a shame really because you would hope that the minds engaging with educational debate from every country would add to making the debate more urgent.

The modern education system is sometimes characterised as being one where kids are mindlessly forced to rote learn and that we have to fight against this industrial factory like education. It’s anecdotal I know but I have worked in five schools and visited a few more and never seen anything like this. Where are all these schools that are battery farming their kids? Most schools are definitely more progressive in their outlook than traditional, in this sense. Although I would contend that good schools and teachers in them know when to adopt different techniques as necessary.

If you engage in debates about the aims and methods of education it is common to read thinking like this, a popular view exposed by many educators, and widely influenced by romanticism:

“The point I am making is that DI is very successful in a certain thing that we are measuring. Remembering stuff. For an education system that measures how well you can remember stuff sat at a table for two hours (of which the DP is really no different from any other offering) then I’m sure DI is highly effective…. but really why do we care? We all pretty much know that that such a metric is a) a terrible way for Unis and businesses to know that they have recruited an effective colleague b) it just isn’t they way to make it in the world past examinations. Once our smartphones can answer any knowledge based examinations (not far off from now) then DI will just about be a waste of everybody’s time. What I’m interested in is what type of instruction leads to creative, communicative, empathetic, collaborative, entrepreneurs and explorers? If DI does that then I’m interested. But first we have to develop a way of measuring these things to see if a certain practice achieves it. Any other research is basically past its sell-by-date, as I suspect are exam based remembering courses.”

Remembering stuff. It’s the practical equivalent of the old, male and stale ad-hominem stereotype trotted out in arguments in post-modernist education discussions at times. It’s uncool. It’s useless and why would anyone who cared about kids and their futures insist on paying attention to it in their classroom or school? It’s outdated. We don’t need to remember because we have google now. We don’t need knowledge because AI will take over our jobs and if we make sure kids know and remember stuff then they are doomed to be job-less, on future the scrap heap, in a world where 65% of the jobs haven’t yet been invented yet.

Why do we care about remembering stuff? 

First, let us not conflate remembering and knowing. They are not the same thing. Technically, remembering is simply the process of retrieving information from your long term memory that you know. Knowing something is having it stored there in the first place. It is possible to know something and not remember it.

This argument above mentions remembering initially but then refers to knowledge based exams and questioning the value of knowing stuff when our smartphones can do that for us later, effectively conflating the two. Both knowing stuff and being able to remember stuff are important. It’s no good knowing stuff and not being able to remember it and you can’t remember what you don’t know. So, in my view education has to help students do both of these things. Why is knowing stuff and then remembering it important and why should we care?

Well, actually, knowing stuff is still pretty important. Believe it or not. Some educator’s use Bloom’s Taxonomy to assert that remembering stuff is at the bottom of the pile, a low order skill useless on its own. However, despite the fact that this taxonomy is not informed by the cognitive psychology of how people learn and it is often presented uncritically, this interpretation is also not what Bloom intended. He put knowledge at the bottom as it is the foundation on which all else is built.

You can’t do much if you don’t know anything. And in fact the more you know, the more you can do, including learn more. The Matthew Effect is a well documented psychological phenomenon by which the rich get richer and the poor get poorer. The more you know, the easier it becomes to learn more and therefore become a life-long learner. That is one reason why we should care, especially if we want to make life-long learners.

These days it is fashionable for international educators to discount knowing  stuff because the international consensus is that 21st century skills are more important than knowledge per se. These 21st century skills are generally recognised to be the four C’s of communication, collaboration, creativity and critical thinking. The line of reasoning is, generally, that we need to teach these skills instead of knowledge.

There are a few problems with this line of thinking. Firstly these skills are not actually 21st Century in and of themselves, and there is no reason to think that they are more important this century than they were in the time of Julius Cesar. Indeed, calls for skills based curriculums go back at least a century already.

Secondly, we can’t have people skilled in these areas who aren’t also knowledgable. Most psychological research to date suggests that creativity requires knowledge and it is only possible to think critically about what you already know about. If you really think about it – to be a great communicator you actually need to know about what you are communicating about. Could you imagine the BBC Earth documentaries not only without a knowledgable David Attenborough but the teams of knowledgeable researchers who write the scripts?

Thirdly, the idea of teaching generic skills is also flawed. The generic skills method of teaching postulates that authentic tasks are ones that mimic real life i.e. science teaching that gets kids to act like scientists. Authors like Daniel Willingham and Daisy Christodoulou point out that the most effective way of teaching skills is through the deliberate practice method. Just as a football team doesn’t practice by playing games, but by breaking the skills needed to win (dribbling, passing) down to their component tasks and practicing those.

In short knowing stuff (and remembering it) is the foundation of the skills we want to instil in our kids, it is also the foundation of understanding and the foundation of life-long learning.

We all pretty much know that that such a metric is a) a terrible way for Unis and businesses to know that they have recruited an effective colleague b) it just isn’t they way to make it in the world past examinations.

Do we? How exactly do we know this? It seems hard to make that claim as it is pretty much unmeasurable. Even if you could survey every employer and university there are too many conflating variables. We are all products of this system. This claim is made without any proof and the burden of proof lies with the one making the claim.

Once our smartphones can answer any knowledge based examinations (not far off from now) then DI will just about be a waste of everybody’s time.

Oh no. Seriously? We still honestly think this? It is right up there with the “we can google it” claim that knowledge isn’t worth having. In addition to what I have written above I should highlight here the distinction between working and long term memory.

Working memory is what you can hold in your awareness and it is limited. The environment and long term memory are accessible from working memory and long term memory is unlimited in its store.

If we rely on google and not our long term memory we will find it very hard to make sense of the world around us as our working memories will constantly be overwhelmed. We wont be able to chunk information.

Knowledge isn’t just what we think about it is what we think with. If you rely on google on your smartphone you won’t be able to think well, you certainly won’t be able to think creatively nor critically nor communicate well.

Also, google is blocked in China. Do we really want to give governments that much power over knowledge and what we know and can know?

What I’m interested in is what type of instruction leads to creative, communicative, empathetic, collaborative, entrepreneurs and explorers? If DI does that then I’m interested. But first we have to develop a way of measuring these things to see if a certain practice achieves it.

Yes, it can do. DI has been shown to effectively increase what people know and remember. If knowing and remembering is the foundation of being able to think well, collaborate well,  and create well then we shouldn’t just throw these out.

One of the problems with international education in my view is that it over emphasises inquiry learning, making ideologues get hot under the collar when DI and other guided instruction is mentioned. We are trained to think schools are battery farming kids, when to be honest, they really aren’t.  I think we need to try to find out what works in what context and focus on that. I think that there is a place for guided instruction.

Anyway, DI does not always equate with rote learning. Why make it out to be?

I am also now reminded me of this article and this tweet. They are based on similar assumptions and outlooks, and I had wanted to write something in response to these claims.

I agree with Noah Harari when he writes that we often conflate intelligence and consciousness.  I am not convinced that AI can actually know anything. I think it is intelligent and can process a lot of information quickly, but I would contend that to know anything and remember anything you need to be conscious.

If this is true what is the real risk presented by AI? Probably automation of tasks that rely on data processing in some form. Doctoring for example, requires the ability to process symptoms and match them to known illnesses. But not every job is at risk of automation because not every job relies purely on data processing. As Harari contends in his books, the highly prized human jobs of the future will be the ones that rely on human ability to relate to other humans. Therefore Doctors are at much more risk of being automated than Nurses. However, Nurses still need to know an awful lot of stuff as well as be at good at relating to other people to be able to do their jobs.

Humans need knowledge to be able to think well and to specialise in areas. If we don’t ensure that people know things they definitely will not be better placed to work with or instead of AI. The people that are replaced by AI will be the ones who don’t know much.

Additionally, the fact is knowledge rich curriculums demonstrably reduce inequality and with the way social divides are opening up in our modern society perhaps the way for international education to contribute to a peaceful world is to close those gaps? Seeing as DI has been demonstrably shown to reduce social inequality (See Why Knowledge matters by E.D. Hirsch) and as international curriculum’s like the IB is placed in many public schools in poorer areas, I find it’s focus on inquiry teaching quite worrying.

I wonder if international educators can afford to ignore this stuff because generally our kids come from educated and affluent homes?

Notes on making good progress?: Chapter 5

In this series of posts I record my notes from Daisy Christodolou’s book “Making good progress? The future of Assessment for Learning” It is quite excellent. You can buy a copy here.

Exam-based assessment

Exams based on the difficulty model produce a lot of information that can potentially be used formatively as pupil performance on each question can be measured using the principle of question-level or gap analysis.

In the quality model of exam what looks like an exam is still an assessment that relies heavily on descriptors.

Exams sample a domain however and are not direct measurements. While a test that samples from a domain may allow inferences about the domain but it does not allow inferences about the sub-domains because the number of questions for each sub-domain will be too small. The resolution is too granular and not fine enough. Careful analysis of the question responses may provide useful feedback, particularly for harder questions that rely on knowledge from multiple domains, but this more nuanced information is not captured in a question-level analysis.

Harder questions at GCSE rely on knowledge of several concepts. It is hard to formative assess where student misconceptions lie if they get these questions wrong from a gap analysis. Complexity reduces formative utility.

It is not possible to measure progress fairly using summative exams [like past GCSE papers] because it is entirely possible for a student to have made significant progress on a sub domain but for that to not show up in a summative exam situation and because that topic may be poorly represented in the test, you cannot use just those questions to analyse their progress either.

The risk is that teachers may be incentivised to focus on activities which improve short term performance but not long term learning. Measuring progress with grades encourages teaching to the test which compromises learning.

Because exam boards spend so much time and resources trailing and modelling assessments it is hopeless for teachers to think they can match that rigour in the assessments they design.

Often tests are pulled in two different directions and this highlights a tension between classroom teachers who want formative data and senior managers who want summative progress data.

Notes on making good progress?: Chapter 4

In this series of posts I record my notes from Daisy Christodolou’s book “Making good progress? The future of Assessment for Learning” It is quite excellent. You can buy a copy here.

Descriptor-based assessment

It is suggested that descriptor based assessments can be used to combine the purposes of formative and summative assessments, but this chapter outlines why this isn’t really the case.

Statements given in formative assessments across many different lessons can potentially be aggregated to give a summative assessment. Although problems arise with the aggregation itself – do students need to meet all of the statements at a particular level? To get round this different systems apply different methods and algorithms.

I have direct experience of using this assessment model for grading student internal assessment in DP biology classes. The guide advocates best-fit grading.

Although different systems give different names to the categories and levels that they give to assessments using descriptors, they aim to create a summative, shared meaning. Even if we don’t intend to create a summative assessment, the moment we give a grade or make a judgement about performance we are trying to create a shared meaning, we are making a summative inference. As discussed in the previous chapter, for that summative inference to be reliable the assessment has to follow certain restrictions: it has to be taken in standard conditions, distinguish between candidates and sample from a broad domain.

Everytime we make a summative inference we are making a significant claim that requires a high standard of evidence to justify it.

This model does not provide valid formative data because the model is based on descriptions of performance, they lack specificity as to the process and therefore cannot be used responsively in the way that true formative assessment should be. They do not analyse what causes performance. They are designed to assess final performance against complex tasks.

The reasons descriptors are not useful for formative feedback because:

  1. They are descriptive not analytic
  2. They are generic not specific
  3. They are short term not long term

Pupils may benefit from tasks that are not able to be measured by the descriptors. The descriptors do not provide a model of progression. For example making inferences from text requires a large amount of background knowledge, gaining that knowledge and assessing the gain cannot be done using activities that can be assessed by these descriptors.

You may argue that teachers are free to use whatever activities they want in lessons as long as they realise that only some of them are capable of contributing to the final grade.

However some of the systems warn against using other tasks and can be designed to be used every lesson and require judgements to be made against hundreds of different statements, leaving little time for anything else.

Therefore there is lots of pressure to use class time to be the type of task you can get a grade from. Quizzes and vocab tests get a bad reputation as they don’t provide evidence for the final grade. If every lesson is set up to allow pupils to demonstrate fully what they are capable of, then lessons start to look less like lessons and more like exams.

Generic feedback based on descriptors does not analyse where the issues are and importantly, tell students how to improve. The descriptor only allows tasks that allow grading by descriptor. This will leave it hard for the teacher to diagnose where a pupil has gone wrong. Even at more advanced levels of understanding there is still value in isolating the different components of a task to aid diagnosis.

Educators can’t even agree on what learning is at least doctors and nurses agree on what health is!

Generic targets provide the illusion of clarity, they seem like they are providing feedback but they are actually more like grades. If a comment is descriptive and not analytic, then it is effectively grade-like in it’s function: it is accurate but unhelpful.

Finally these descriptors do not distinguish between short term performance and long term learning. This is the most difficult factor in establishing reliable formative inferences. If we want to know if a student needs more teaching on a topic, then formative assessment right after a unit will not allow us to make such an inference.

In terms of summative assessments there are several reasons why using prose descriptors is problematic. These are:

  1. Judgements are unreliable – underlying questions can vary in difficulty but can still match the prose description. If different teachers both within and between schools utilise different questions and tasks (that match the prose description) to summatively assess students, this introduces uncertainty into the assessment and cannot be used to produce a shared meaning. Ensuring examiners interpret these statements in the same way is also extremely difficult. Descriptors lack specificity and can therefore be interpreted differently, making judgements unreliable. Different judgements cannot produce a shared meaning.
  2. Differing working conditions – different tasks completed in different situations cannot produce shared meaning. Subtle differences in how tasks are present will influence how well students can do.
  3. Bias and stereotyping – humans are subject to unconscious bias and stereotypes even more so under pressure. This will influence the judgements that they make.

The Extended Essay: The central support for teaching ATL skills?

I have reservations about the IB ATLs. I have written about this previously, mainly focussing on the approaches to teaching and I don’t really want to go over these issues again, suffice to say that it still causes me concern that the IB, as the only truly global non-national/international curriculum has such strong ideology that underpins what it requires teachers do. In fact the more I think about it, the more concerned I am by the fact that, on reflection, most teacher training curriculums that I am aware are not balanced and do not give good education to teachers on evidence, history, philosophy. Instead they simply uncritically present one ideology as fact.

My previous post focussed on the approaches to teaching. In this post I want to focus on the IB’s approaches to learning which I will refer to in this post simply as ATls. Hopefully this post will be a bit more positive!

There are certainly areas of the of the ATLs that I have come to appreciate. Before I get there I just want to state that from what I have read, I think that the evidence from cognitive science is pretty clear cut: there are no such things as general learning/thinking skills. More over, I don’t think that the often quoted 21st Century learning skills or 4Cs of: Communication, Collaboration, Critical-thinking, Creativity are anymore important in the 21st Century than they were in the 19th and 20th centuries (they were referred to then; they are nothing new now) and I think the whole enterprise of trying to teach them outside of domains is an exercise that will only make our education system weaker, not stronger.

To make the ATLs work within the school context they need to be linked to and embedded in domain specific content. Some of them maybe more generalisable than others and in that sense may be more malleable for being taught independently, but most will need to be embedded within the teaching of specific content of a domain.

For example, elements of the self-management tranche of the identified ATLs may well be more stand alone, or at least can be taught independently of subject matter. However, teaching students about time-management still needs material to work on, in this case the students own general workload at school.

Mindfulness is another self-management skill that can be taught independently and, in my opinion, to great value for the learner. However for this to be affective it needs staff buy-in and training. While mindfulness is the new trendy idea, there is a lot of misunderstanding about what it actually is.

Thinking skills, communication skills and research skills, as identified by the IB’s ATL guide all require teaching and embedding within content. In terms of the Communication and research skills, one of the central pillars to teaching these is the Extended Essay.

In most schools the Extended Essay process is placed to the middle to end of the DP, with students perhaps beginning the process in term two of the first year and ending sometime around Christmas of the second year. This year we have gone to the extreme of bringing it to the front of the process as we feel it underpins and provides so many opportunities to explicitly teach the ATLs but still linking them to specific subject knowledge.

We have introduced our students to the process this September and have planned in specific interventions that look at research skills and communication skills, while we also begin to map out how these skills are taught vertically from year 7.

Our current year 12 students are supported through the process with clear scaffolding. First they are asked to think about general topics and clearly led through ways to identify and think about ideas. Subsequently, we introduce them to the library and its resources in a series of sessions which first look at the library and its resources in general before looking at the databases we have access to and how researchers utilise these resources appropriately using boolean operators..

Students are then asked to draft a proposal for their Extended Essay which would include the research question, an outline of the subquestions and a list of potential sources that can be used. This proposal needs to be agreed to and signed off by their supervisor before Christmas of the first year. The proposal becomes the basis for the first formal reflection.

In the second term, we show students how to critically appraise sources and continue to give them support in writing their outline for their essay. This takes place un until April where they submit their outline to the supervisors and follow up with a second meeting.

Following on from this meeting students will recieve feedback and after their exams, during their core week, they are given time to work on writing their Extended Essay in the morning with the aim that they would have a first draft completed by the end of the third term and submitted to their supervisor, this would form the basis of their interim reflection and their third meeting.

Student can then finalise their work over the summer, submitting it and completing their viva voce at the start of their second year. In this way this major piece of work is completed before the bulk of internal assessments and university applications begin.

By front loading the extended essay process in this way, I believe that the team has a greater chance of explicitly teaching, the research and communication skills needed to succeed in the extended essay. This reduces the chances of these skills being left to chance and also allows students to be able to apply these skills in their internal assessments for their other subject.

Finally by also, bringing some of the other internal assessments into the later half of the first year, we can begin to help students develop strategies for their own time management and organisational skills, by explictly showing them how they balance the commitments of the extended essay, internal assessments and other work. This can be done early in the course, allowing them to apply these skills later in the course.

Notes on making good progress?: Chapter 3

In this series of posts I record my notes from Daisy Christodolou’s book “Making good progress? The future of Assessment for Learning” It is quite excellent. You can buy a copy here.

Making valid inferences

If the best way to develop a skill is to practice the components that make it then it is hard to use the same type of task to assess formatively and summatively.

Summative assessment tasks aim to generalise and create shared meaning beyond the context in which they are made. Pupils given one summative judgement in one school should be getting a similar judgement in another school.

Formative assessment aims to give teachers and students information to form the basis for successful action in improving performance.

Although, at times, a summative task may be able to repurposed as a formative task, generally different purposes pull assessments in different directions. The purpose of an assessment does impact on its design which makes it harder to simplistically re-purpose it.

Assessments need to be thought of in terms of there reliability and their validity. The validity of an assessment refers to the inferences that we can draw from its results. The reliability is a measure of how often the assessment would produce the same results with all other factors controlled.

The example of timing of mocks comes to mind. Whether you want these to be a summative or a formative assessment will affect when you favour setting them.

Sampling (the amount of knowledge from a particular domain assessed by an assessment) affects the validity of an assessment. Normally in summative assessments questions sample the domain, they do not cover it in its entirety.

Some assessment do not have to sample. If the domain they are measuring (letters of the alphabet) is small this isn’t a problem. Further along the educational pathway this becomes harder.

Assessments also need to be reliable. Unreliability is introduced into assessments through sampling, the marker (different markers may disagree) and the student (student performance can vary day to day).

Models of assessment include the quality model and the difficulty model. Sources of unreliability affect each of them in different ways. Quality model requires markers to judge how well a student has performed (think figure skating), difficulty model requires pupils to answer questions of increasing difficulty (think pole-vault).

There is a trade-off between reliability and validity. A highly reliable MCQ assessment (reduction in sampling and marker error) may limit how many inferences you can make from the assessment, you may be unable to use this as a summative assessment as it doesn’t properly match up with the final assessment.

However reliablity is a prerequisite for validity. If an assessment is not reliable then the inferences drawn from it, its validity, is also not reliable. We can’t support valid inferences.

You may well be able to create an exciting and open assessment task which corresponds to real-world tasks; however, if a pupil can end up with a wildly different mark depending on what task they are allocated, and who marks it, the mark cannot be used to support any valid inferences.

Summative assessments are required to support large and broad inferences about how pupils will perform beyond the school and in comparison to other peers. In order for such inferences to be valid they must be consistent.

Shared meanings impose restrictions on the design and administration of assessments. There are specific criteria needed for this. To distinguish between test takers, assessments need items of moderate difficulty and assessments must sample. Samples need to be carefully considered and representative.

The main inference needed for formative assessments is how to proceed next. Assessment still needs to be reliable but the inferences do not need to be shared even with kids in the same room. I can therefore help some kids more than others. It is about methods. It needs to flexible and responsive.

The nature of inference posses a restriction on assessment. Trying to make summative inferences from tasks that have been designed for formative assessment use is hard to do reliably without sacrificing flexibility and responsiveness.

Assessment theory triangulates with cognitive psychology. The process of aquiring skills is different from the product, the methods of assessing the process and the product are different too.

Formative assessments need to be developed by breaking down the skills and tasks that feature in summative assessments into tasks that will give valid feedback about how pupils are progressing towards that goal.

They can be integrated into one system, to be discussed in a later chapter.

Most schools make the mistake of summatively assessing all too frequently.