In this series of posts I record my notes from Daisy Christodolou’s book “Making good progress? The future of Assessment for Learning” It is quite excellent. You can buy a copy here.
Making valid inferences
If the best way to develop a skill is to practice the components that make it then it is hard to use the same type of task to assess formatively and summatively.
Summative assessment tasks aim to generalise and create shared meaning beyond the context in which they are made. Pupils given one summative judgement in one school should be getting a similar judgement in another school.
Formative assessment aims to give teachers and students information to form the basis for successful action in improving performance.
Although, at times, a summative task may be able to repurposed as a formative task, generally different purposes pull assessments in different directions. The purpose of an assessment does impact on its design which makes it harder to simplistically re-purpose it.
Assessments need to be thought of in terms of there reliability and their validity. The validity of an assessment refers to the inferences that we can draw from its results. The reliability is a measure of how often the assessment would produce the same results with all other factors controlled.
The example of timing of mocks comes to mind. Whether you want these to be a summative or a formative assessment will affect when you favour setting them.
Sampling (the amount of knowledge from a particular domain assessed by an assessment) affects the validity of an assessment. Normally in summative assessments questions sample the domain, they do not cover it in its entirety.
Some assessment do not have to sample. If the domain they are measuring (letters of the alphabet) is small this isn’t a problem. Further along the educational pathway this becomes harder.
Assessments also need to be reliable. Unreliability is introduced into assessments through sampling, the marker (different markers may disagree) and the student (student performance can vary day to day).
Models of assessment include the quality model and the difficulty model. Sources of unreliability affect each of them in different ways. Quality model requires markers to judge how well a student has performed (think figure skating), difficulty model requires pupils to answer questions of increasing difficulty (think pole-vault).
There is a trade-off between reliability and validity. A highly reliable MCQ assessment (reduction in sampling and marker error) may limit how many inferences you can make from the assessment, you may be unable to use this as a summative assessment as it doesn’t properly match up with the final assessment.
However reliablity is a prerequisite for validity. If an assessment is not reliable then the inferences drawn from it, its validity, is also not reliable. We can’t support valid inferences.
You may well be able to create an exciting and open assessment task which corresponds to real-world tasks; however, if a pupil can end up with a wildly different mark depending on what task they are allocated, and who marks it, the mark cannot be used to support any valid inferences.
Summative assessments are required to support large and broad inferences about how pupils will perform beyond the school and in comparison to other peers. In order for such inferences to be valid they must be consistent.
Shared meanings impose restrictions on the design and administration of assessments. There are specific criteria needed for this. To distinguish between test takers, assessments need items of moderate difficulty and assessments must sample. Samples need to be carefully considered and representative.
The main inference needed for formative assessments is how to proceed next. Assessment still needs to be reliable but the inferences do not need to be shared even with kids in the same room. I can therefore help some kids more than others. It is about methods. It needs to flexible and responsive.
The nature of inference posses a restriction on assessment. Trying to make summative inferences from tasks that have been designed for formative assessment use is hard to do reliably without sacrificing flexibility and responsiveness.
Assessment theory triangulates with cognitive psychology. The process of aquiring skills is different from the product, the methods of assessing the process and the product are different too.
Formative assessments need to be developed by breaking down the skills and tasks that feature in summative assessments into tasks that will give valid feedback about how pupils are progressing towards that goal.
They can be integrated into one system, to be discussed in a later chapter.
Most schools make the mistake of summatively assessing all too frequently.