Summary of Assessment Principles and Practices

Originally posted on February 2, 2019 @ 9:00 am

Last term the IB published a new document – “Assessment principles and practices – Quality assessments in a digital age”. Below I post my summary and notes on the document.


The aim of this document is to explain the principles that the IB has adopted to make sure that assessment is meaningful, fair and in the best interests of the students involved.

All assessments are a balance between conflicting demands and many concerns about testing fail to take this into account. An example is the tension between reducing the assessments burden and the risk of candidates only having one opportunity to show what they can do.

The IB aims to be a holistic programme of study and this should be reflected in the assessments. Decisions should therefore focus on the impact of the overall programme, not just on one subject, discipline or assessments. The IB focusses on what it is important to assess and not what is easy to assess.

Assessment is all about balancing conflicting and competing demands. Poor quality asssessments will lead to poor quality outcomes, even if assessment is “only” formative.

Assessment principles are what the IB thinks are important in creating qualifications and assessments. They come from what is considered important about an IB education. Assessments should support education, not distort it.

Assessment practices are the ways in which the principles are delivered in a meangingful and practical way.

IB exams should represent an opportunity for candidates to show what they understand, rather than being a unique experience which they need to master. Technology therefore should be driven by the assessment needs and not the other way around.

The IB is exploring eAssessment as a way to ensure that students are able to genuinely demonstrate what they know. In this way, assessments will seek to utilise technology where it may make the assessment more authentic. The IB has already moved to eMarking for many DP components and will seek to implement eAssessment in the DP, although it aware of the risks:

  1. Burdens on schools
  2. Risk of failure
  3. Security
  4. Tech for the sake of Tech
  5. Bias against certain groups of students and device effects
  6. Changing standards
  7. Barriers to schools offering IB programs


Assessment can mean many different things but generally is divided into summative and formative although today there is a drive towards “assessment as learning”. Assessments need to be designed carefully to meet the purposes its results are used for. Excellent formative assessments may be poor summative assessments (See here).

Assessment can mean any of the different ways in which student achievement can be gathered and evaluated and there are different assessment models, namely the compensation model and mastery model. The mastery model requires a basic minimum in all criteria to be met, whilst the compensation model will allow poor performance in one criteria to be balanced by very good performance in another criteria.

Formative assessments aim to provide detailed feedback to teachers and students about the students strengths and weaknesses. In contrast summative assessments focus on measuring what the student can do at a specific time. Summative assessment seeks to make a judgement about a candidate, not inform future teaching and learning.

The balance between measuring achievement and identifying correctly what still needs to develop is called assessment validity. It is important to note that the balance between quality of feedback and attainment is opposite in formative vs summative assessment.

Different national systems have adopted different approaches to assessment and these reflect the tensions between the wider aims of the society, the time and resources available. There is no one size fits all or perfectly optimal assessment system. Additionally summative assessment is increasingly being used to analyse teaching quality.

The backwash effect is the influence that an assessment has on the teaching of the content. This can be positive or negative. Snyder’s hidden curriculum is the meaning that students create about a discipline based on this assessment tasks. Assessment needs to be designed around constructivist learning theory.

Marks and grades are not the same thing. Marks refer to credit given to a candidate in line with a mark scheme, and has no other meaning. Grades describe the quality of a candidates work.

Generally IB assessments are not norm referenced. The IB generally uses marks as an indication of overall performance and then looks at how well candidates with x marks performed matched to the grade level descriptors. They then place boundaries based on the descriptors.

Validity means asking if an assessment is fit for purpose. The IB’s first concern is whether the programme is valid, then whether the elements of the programme are valid and finally whether assessments are valid. Validity is not an objective concept and is a balance between competing issues. We cannot prove validity but can construct a validity argument based on evidence.

eAssessment offers new opportunities for interaction within exams and therefore improves the validity of some aspects of assessment. It also removes some of the security concerns while introducing others.

Validity chains can be used to think about validity. There are five elements to the chain:

  • Reliability
  • Construct relevance
  • Manageability
  • Fairness
  • Comparability

All of the above are necessary to achieve validity but there are also tensions between each of them. The IB places the most value in construct relevance – assessments that actually test what they intend to test, but not at the exclusion of all else.

Reliability is the extent to which a candidate would get the same test result if the testing procedure was repeated. There are several sources of unreliability. Consistent outcomes are not the same as the right outcome. The aim of marking reliability is to ensure that all examiners make the same judgement as the senior examiner.

Construct relevance is concerned with accurately measuring the thing that the assessment is attempting to measure.

Manageability can be discussed in terms of the candidate, the school and the IB.

Fairness and bias is concerned with ensuring that the test does not give an advantage to one group over another. Bias can arise from:

  • The delivery of the assessment
  • Bias arising from marking
  • Bias related to assessment questions

Comparability of assessment is concerned with how the grades from assessments can be compared between years or subjects. The IB seeks to maintain three principles about comparability:

  1. The standard of work to achieve grades within a subject or discipline is comparable between years.
  2. Grades between subjects have a consistent meaning so that different routes to achieve the program award are comparable.
  3. Although the IB aims to focus on higher order skills, IB assessments are broadly comparable with similar exams offered by individual nations or other awarding bodies.

IB’s approach to validity

The IB believes that construct relevance and authentic assessment are more important than maximising reliability. The IB believes in rounded, holistic education. Its priority is for strong arguments of validity at programme level. Validity is a complex and multi-faceted balancing act and there is not single right answer, where you place the balance is ultimately a judgement based on the value of the organisation that is developing the assessments. The IB aims to do more than other curricula by developing inquiring, knowledgable and caring young people who are motivated to succeed.  We need to consider how the aims of individual subjects fit into the holistic aims of the IB.

IB assessments are weakly criterion referenced. That is candidate performance is matched against behavioural descriptors.

Comparative marking represents an alternative to traditional marking. The basis is that the human mind is better at making comparisons than absolute judgements. In these examples examiners make win/lose comparisons between pieces of work. In subjective marking mathematics, combined with an importance statement will allow a team of examiners to compare and “mark” students work. However comparative judgement requires many marking decisions because each piece of work must be looked at several times by several examiners.

For the IB the underlying principle is to test what is important and assessments should encourage good teaching. Comments on summative work are used to support examiner marking. Comments on formative work are give feedback to learners.

IB programme-specific processes

Key elements that link all IB programmes are:

  • The learner profile
  • Approaches to teaching and learning
  • international mindedness

IB programmes are conceptual, that is, they focus on powerful organizing ideas that are relevant across subject areas and that help to integrate learning and add coherence to the curriculum. We need to consider how are assessments within each program meet the broader objectives of the DP.


The IB recognises the need for schools and individual teachers to have the space to be creative and the ATLs are suggested as guidance and to help highlight good practice and enable discussion. The ATLs are not meant to be prescriptive. Skills can only be improved over time and if taught in a sustained fashion.


There is definitely some new information here that I am happy to receive. I was not aware that teachers could be observers at grade awarding meetings or at the final awarding committee, and this is something that I would definitely pursue in the future or recommend my teachers do. I was also pleased to see the section on ATLs and the nod from the IB that these are not mean’t to be prescriptive and that schools and teachers should be free to be creative.

Please share your thoughts..

This site uses Akismet to reduce spam. Learn how your comment data is processed.