Moderation of Assessment in Higher Education: A Literature Review
Much of this material is derived from the Assessment Moderation Toolkit developed by the ALTC Moderation for Fair Assessment in Transnational Learning and Teaching Project (2008-‐2010). The toolkit provides resources including guidelines for sampling, “Quick Tips” and professional development. Although the main focus of the ALTC Project was transnational teaching, this literature review and much of the content of the ALTC Project’s resources apply equally to the moderation of assessments in on-shore higher education programs, their marks and grades.
Purpose
The aim of moderation of assessment is to ensure that assessment is fair, valid and reliable (ALTC, 2012c), requiring appropriate assessment activities and accurate assessment decisions (New Zealand Qualifications Authority, 1992). A valid assessment assesses what it sets out to assess, and not something else. It is an appropriate way to assess the learning outcomes that it should be addressing, whereas a reliable assessment activity gives results that are a consistent and accurate picture of what is measured (New Zealand Qualifications Authority, 1992). There are two main reasons for moderation: accountability and improvement. Rigorous moderation of assessment may be categorized as a good practice improvement that lies between risk avoidance and quality enhancement as normative quality assurance (Baird & Gordon, 2009). Learning activities may be continuously enhanced through quality monitoring (such as the internal moderation of student assessments) in contrast to a compliance culture that does not lead to improvement (Horsburgh, 1997, 1998). The underlying principle of quality monitoring should be the encouragement and facilitation of continuous improvement. ECU’s approach to continuous improvement uses the Quality@ECU model and it is the suggested method for quality monitoring at ECU (Edith Cowan University, 2010). Benefits and purposes of effective moderation are: • to improve reliability through discussion of differences in markers’ and students’ interpretations of criteria and marking schemes; • helping to prevent individual marker bias; • decreasing the effect of ‘hard’ and ‘soft’ marking; • increasing student confidence in marking; • staff development; and • creating an assessment community team of markers (Bloxham, 2009).
What is moderation? Moderation of student assessment is a process aimed at ensuring that marks and grades are as valid, reliable, and fair as possible for all students and all markers. Moderation strategies may differ depending on the number of students studying the unit and the number of teaching staff involved. However, the process usually involves collaborative decision-‐making about assessment criteria and expectations for grading levels before marking begins. It may also involve preliminary sample marking, as well as cross marking to check for consistency. Double marking is usually carried out when a piece of assessment has received a fail grade (Institute of Teaching and Learning, 2012). Moderation is more than the checking of assessment marks; it is the checking of assessments from the development of each item to ensure that the whole assessment process is fair, valid and reliable (ALTC, 2012a) enabling equivalence and comparability (ALTC, 2010). There are a variety of understandings and practices nationally for moderation of assessment including • consistency in assessment and marking; • process for ensuring comparability;
Moderation
Centre for Learning and Development
July 2012
measure of quality control; process to look at equivalence; maintaining academic standards to ensure fairness; and part of quality assurance but most people view moderation as all about marking (ALTC, 2012a). The ALTC project viewed moderation of assessment more broadly than just a quality control measure around marking. This is simply because marking and reviewing allocated grades does not guarantee quality assessment (ALTC, 2012a). Quality in educational programs has been described as meeting specified standards, being fit for purpose or as transformative. External quality evaluations are not particularly good at encouraging improvement, especially when they have a strong accountability, or audit, brief (Harvey & Williams, 2010). The heart of the holistic approach to moderation is continuous review (Lawson & Yorke, 2009). Moderation of assessment is a key practice underpinning assessment equivalence for Australian universities (ALTC, 2010). Moderation is a broad label that covers activities that help to ensure that there is uniform interpretation and application of standards (New Zealand Qualifications Authority, 1992). “Moderation is a process for assuring that an assessment outcome is valid, fair and reliable and that marking criteria have been applied consistently” (Bloxham, 2009). Moderation helps to raise standards, expectations and levels of consistency (The Scottish government, 2011). Moderation of on-shore assessments, grades and marks in Australian universities is achieved and is important in much the same way as moderation of transnational (off-shore) programs. The principle promoted to Australian universities to ensure quality and sustainability in the economically and educationally significant TNE market is one of ‘comparability’ or ‘equivalence’ between what happens in Australian-‐based programs and their TNE delivery (Castle & Kelly, 2004; Sanderson et al., 2010). Moderation is about ‘comparability’ or ‘equivalence’ between what happens in one class, assessment or program as compared with all others by the same name. The host unit coordinator is usually accountable. In the same way, for off-‐shore programs, the host university that confers the qualification may be held ultimately accountable for academic quality, including setting assessments and overseeing extensive moderation exercises (Lim, 2008). Moderation of assessments, as quality monitoring, should benefit the students’ learning experience (Harvey, 2004). Any higher education program may be identical, tailored but equivalent, significantly tailored, or completely different, from another version e.g. one offered on another campus. • • • •
National protocols
Moderation of assessment in higher education is particularly important for quality assurance of transnational education with the goal of “equivalence” and “comparability” between onshore and offshore provision as required by the National Protocols for Higher Education Approval Processes (ALTC, 2010).
ECU’s moderation policy ECU policies for Moderation of Assessment, Assessment and Course and Unit Review guide the process of moderation for all units offered outlining specific procedures for both multiple markers and managed courses.
The difference between benchmarking and moderation The terms benchmarking and moderation have quite different meanings. For example, benchmarking an institution’s student grades, marks and assessments is not the same as moderating these. Moderation as a process involves the checking of assessment marking to ensure equivalence, reliability, validity, fairness and accuracy (ALTC, 2012a; Bloxham, 2009) necessary whenever more than one person marks assessment items in a unit and when a unit is taught on more than one campus. The same unit may be offered over different semesters, schools, even in different countries. Moderation of assessment checks that marking is consistent such that an assessment item would be awarded the same mark by any marker. Whereas, the aim
Moderation
Centre for Learning and Development
July 2012
of benchmarking assessment processes is to make transparent the areas for improvement and areas of good practice.
Scaling of marks Scaling refers to the adjustment of student assessment scores based on statistical analyses without reference to the quality of students’ responses (Service Learning Australia, 2012). Post-‐ assessment scaling of marks should be avoided (ALTC, 2012c).
The need for moderation The relationship between student assessment and grading, quality assurance and academic standards has been a major issue (James, 2003). The marks and grades given to students are commonly decentralized, subject-‐specific decision-‐making processes as judgments about academic standards (Bloxham, Boyd, & Orr, 2011) often with the terms standards and criteria used interchangeably. To achieve high and consistent quality in assessment practices, investment in resources and professional development are needed with close attention paid to the role of sessional markers (Griffith University, 2012). Ensuring consistency of assessments in a unit, and even moderation of these assessments, is a challenge when a unit is offered on more than one campus and also on-‐ line (Kuzich, Groves, O'Hare, & Pelliccione, 2010). A prerequisite for construct validity is that assessment is based on relevant content. However, one study of portfolio marking showed that the quality of markers’ judgement process about the content was low. Markers based their judgements mainly on personal opinion and less on evidence in the portfolios (Van der Schaaf, Baartman, & Prins, 2011). A grade is essentially a symbolic representation of the level of achievement attained by a student (Sadler, 2009). Grade integrity is defined as the extent to which each grade (or assessment mark) is strictly commensurate with the quality, breadth and depth of a student’s performance (Sadler, 2009). Many assessment practices compromise grade integrity in higher education. Academic standards are fixed levels of quality that recognise student academic achievement by competent, mutually calibrated discipline and professional peers. These standards are constant points of reference but may shift deliberately in response to shifts in curriculum, technology and the discipline and profession (Sadler, 2009). Marking and grading in most disciplines is inevitably subjective (Hughes, 2011) but a systematic approach to identifying significant tacit beliefs may assist in reducing the effect on grader variation (Hunter & Docherty, 2011). Conversations amongst markers assessing student performances influenced how the group of markers reached agreement (Orr, 2007). Ipsative marking for students who are known or known to have difficult circumstances emphasizes the markers’ involvement with and commitment to the students’ development over time (Orr, 2007). If broad categories are used as the basis for grading students’ work or attainment of skills and knowledge, then grading becomes overtly judgmental and subject to many psychosocial pressures (Yorke, 2010). After markers had participated in a professional development using an integrated moderation of assessment program (IMAP), variation between markers tended to decrease (reliability increased) particularly when they were divided into novice and experienced groups (Bird & Yucel, 2010). Also time taken to mark tended to decrease so efficiency of marking increased after participation in the professional development (Bird & Yucel, 2010). Off-‐shore staff often raise the issue of academic freedom as teaching resources from another country seem prescriptive and restrictive to some lecturers (O'Rourke & Al Bulushi, 2010). Providing teaching and learning materials that can be contextualized to the local environment in another country, that account for language levels and are culturally sensitive are challenges (O'Rourke & Al Bulushi, 2010). Ensuring standards of student assessment are applied equitably across multiple campuses may be assumed (O'Rourke & Al Bulushi, 2010) and as part of moderation this assumption would need to be verified.
Moderation
Centre for Learning and Development
July 2012
Moderation of group work and peer assessment
Moderating the marks allocated to group work, a popular component of university curricula in which students complete assessment tasks as a team, is important (Bushell, 2006). The ability to work as part of an effective team is regarded as an important skill and often a graduate attribute. Despite the potential benefits of team work, moderation of marking is essential for students to feel confident that they will be rewarded fairly for their contributions and that any ‘free-‐riders’ will not benefit from the efforts of others. The usual practice in higher education is for students to be graded solely on the quality of a submitted piece of work or presentation without consideration of the effort or input into the product (Johnston & Miles, 2004). Alternatively, team members may assess individual contribution to team performance as part of assigning grades to students on an individual basis (Bushell, 2006). Peer assessment may be done in a number of ways but moderation is especially important when it is part of summative assessment. The peer assessment marks may be submitted in secret or the team may have to agree on contributions by its members. The peer assessment grades may form a fraction of the mark or may be used as a multiplying factor. They may be a single score or several scores may be made against marking criteria. It is the task of the tutor to manage peer assessment moderation problems, ensuring that peer evaluations are used in a fair and equitable way to distribute marks to individual team members (Bushell, 2006). Peer assessment of group work may promote reflective learning and develop critical thinking skills. However, students, in one study, showed a self-‐bias in self-‐assessment of group project work, rating their own contribution higher than that of other group members. Peer-‐assessment disadvantages the more able students who often award themselves a lower grade than do their peers (Johnston & Miles, 2004). Yet, knowing that individual contributions to a group project were to be assessed resulted in a decrease in free-‐riding and more involvement in group learning (Johnston & Miles, 2004). Lejk and Wyvill (2001) recommend excluding self-‐assessment improves the objectivity of grading because high performing students generally tend to under-‐ rate their own performance (Bushell, 2006). The possibility of reassigning peer grades that appear to be biased based on overall group assessment ranking has been proposed as simple, robust and acts as a powerful deterrent to dishonest assessment of peers (Bushell, 2006). Reassigning peer grades would be an open and transparent process including discussion with all team members or the whole class. This is especially important for pre-‐service student teachers so that they become acquainted with moderation of assessment processes that they could use later in schools. These issues arising from the assessment of group work demonstrate that moderation starts with the design of assessment tasks. Peer marking is essentially an example of multiple markers. So strategies for moderation of groupwork and peer assessments are the same moderation strategies used for multiple markers.
When moderation is needed Pedagogical thinking about assessment, firmly embedded within teaching and learning, is thinking that is in contrast to assessment as a summative exercise (James, 2003) implying that moderation is needed throughout the process of teaching and learning. Moderation of assessment is especially necessary for • large units; • multiple markers; • assessments when teaching occurs on different campuses; • assessments with subjective answers; • assessment that differs across individual students or cohorts of students (ALTC, 2012e). Moderation is the processes and activities that occur both before (i.e. quality assurance) and after all assessment (i.e. quality control). So moderation of assessment encompasses all stages of all assessment (ALTC, 2010, 2012a). Moderation can be applied in three phases: 1. assessment design and development; 2. implementation, marking and grading; and
Moderation
Centre for Learning and Development
July 2012
3.
review and evaluation (ALTC, 2012a, 2012c).
How moderation is done
The process of moderation commonly occurs after marking commonly in university practice (Kuzich, et al., 2010) with moderation usually carried out by sampling, i.e. targeting selected, representative points to check the quality of the whole (New Zealand Qualifications Authority, 1992). However, the ALTC Project team suggests using marking criteria, discussions of standards, cross marking and avoiding post-‐hoc adjustment of marks/grades as good practices in moderation of assessment in higher education (ALTC, 2012b). Quality of judgments when marking portfolios improved with the use of marking criteria (Van der Schaaf, et al., 2011). Marking guides and processes that check marking and check that all staff are familiar with standards must be instituted for uniform standards to be maintained (Castle & Kelly, 2004). Building tutors’ expertise at the commencement of a unit increased the rigour of assessments and maintain consistent tutor expectations and marking comparability (Kuzich, et al., 2010). Identifying papers with the greatest variances from blind reviewing then concentrating discussion on these focuses the socialization process (Hunter & Docherty, 2011). There are many moderation strategies and only a few of these strategies need to be employed for the greatest impact after considering the likely contributors to a lack of comparability (ALTC, 2012d). The ALTC team suggests ten moderation strategies with statistical moderation only to be used when all other ten strategies fail: 1. making assessment and marking criteria explicit, provide exemplars; 2. distributing marking keys/guides; 3. conducting comparability meetings (‘consensus moderation’); 4. monitoring markers; 5. second-‐marking; 6. having one person acting as ‘moderator’ for transnational contexts; 7. anonymous assessment; 8. double-‐blind marking; 9. panel marking; 10. external moderation; and 11. statistical moderation (ALTC, 2012d).
Assessment design and development before assessment is set The first phase of moderation is to review all assessment items collaboratively with all markers before the assessment is set and make amendments as required. Identify assessment items that may advantage or disadvantage any students, potential marking biases, cultural issues and subjectivity and amend where necessary. 1. Check that items • match the learning outcomes; • are as objective and fair as possible; • take into account learning styles, English language, cultural bias, cultural and tacit knowledge; and • are varied across the unit and course. 2. Check that there is enough time for students to complete the task well. 3. Check that the assessment does not disadvantage students whose first language is not English. 4. Check that criteria, rubrics and marking keys are clear, detailed and emphasise merit • for students in all contexts; and • for the entire marking team. 5. Ensure that students are familiar with assessment criteria, rubrics and marking keys. 6. Hold a real or virtual pre-‐marking meeting with all markers to discuss requirements, standards and possible divergent answers to assessment questions.
Implementation, marking and grading done before marks are allocated
Moderation
Centre for Learning and Development
July 2012
The second phase of moderation is the implementation, marking and grading that is done before marks are allocated. Whenever more than one person marks an assessment item in a unit, a moderation process must be used to ensure consistency of marks and grades. The unit coordinator is responsible for arranging the moderation process of marks and grades. Ideally, the same person marks each question across all papers for a unit but this is not always possible. So consider sampling items. Choose a small number of items from a larger group so that you can make judgments about the larger group. A suitable sample of assessments is found by choosing a number of items equal to the square root of the cohort (with a minimum of five papers) plus all fails (Bloxham & Boyd, 2007). A summary of sampling considerations is provided by the ALTC project team in the Assessment Moderation Toolkit. Bloxham, S., & Boyd, P. (2007). Developing effective assessment in higher education: A practical guide. Maidenhead: Open University Press. If there are multiple markers, 1. Conduct a consensus marking exercise: • double mark a sample of anonymous items and compare marks; or cross mark anonymous assessments from students they do not directly teach; • compare marking ranges across different cohorts and markers; and • give timely and sensitive feedback to markers. It is important that the markers are not influenced by another marker, their comments, their marks or by knowledge of the student. Double marking: two staff mark the same piece of work submitted for assessment. The original marks and comments may be seen by the second marker. Blind double marking: double marking in which the second marker does not see the original comments or marks. Cross marking: assessments from two staff are exchanged for marking e.g. a tutor marks another tutor’s assessments; or a lecturer on one campus marks for the lecturer on another campus and vice versa. For off-‐shore managed courses, see the specific required steps outlined in 4.2 of the ECU policy for Moderation of Assessment and 4.5 (v) of the Assessment policy. The ECU Unit Coordinator marks the major assessment or final examination and also remarks (10% of enrolment; at least 8) samples of each assessment item (of all grades) then records the results on the Assessment Moderation Report online. 2. Hold a real or virtual pre-‐marking meeting with all markers • discuss student work that attains very high or very low marks; • collaboratively negotiate the allocation of marks and grades to assessment scripts; and • provide a spreadsheet or similar showing all finalized marks and the range of marks to all markers. If marking large numbers over an extended period of time, review earlier items. For off-‐shore managed courses, the ECU Unit Coordinator receives the marks for all assessments from the ECU International Partnership Services Officer, collates the marks and grades then submits a Unit Moderation Report online. The Course Coordinator uses this report in the Annual Course Report as detailed in the ECU policy for Course and Unit Review.
Review and evaluation when marks have been allocated Moderation
Centre for Learning and Development
July 2012
The third and final phase of moderation is the review and evaluation as part of the ongoing improvement process ready for the next time the unit is taught. 1. Avoid post-‐assessment scaling of marks. 2. With contributions from the teaching and marking team, identify and address • potential marking biases; • communication issues between the marking team; • cultural issues in assessment and moderation; and • areas for improvement in curriculum and assessment when next taught. 3. Complete a moderation report with contributions from the teaching and marking team on all these aspects.
Conclusions
Moderation is an integral part of good practice in assessment. Moderation is a continual process ensuring that marks and grades as valid, reliable, equivalent and fair as possible for all students and all markers (ALTC, 2012c). Moderation of assessment is especially necessary for large units; multiple markers; assessments when teaching occurs on different campuses; assessments with subjective answers; and assessment that differs across individual students or cohorts of students.
References
ALTC. (2010). Moderation for Fair Assessment in TNE Literature Review. Retrieved from http://resource.unisa.edu.au/course/view.php?id=285&topic=1 ALTC. (2012a). Assessment Moderation Toolkit, from http://resource.unisa.edu.au/course/view.php?id=285&topic=1 ALTC. (2012b). Good practices in moderation of assessment in transnational education. Retrieved from http://resource.unisa.edu.au/course/view.php?id=285&topic=1 ALTC. (2012c). Moderation Checklist. Retrieved from http://resource.unisa.edu.au/course/view.php?id=285&topic=1 ALTC. (2012d). Moderation Strategies. Retrieved from http://resource.unisa.edu.au/course/view.php?id=285&topic=1 ALTC. (2012e). Streamlining moderation policy and processes, from http://resource.unisa.edu.au/course/view.php?id=285&topic=3 Baird, J., & Gordon, G. (2009). Beyond the Rhetoric: A framework for evaluating improvements to the student experience. Tertiary Education and Management, 15(3), 193-‐207. doi: 10.1080/13583880903072976 Bird, F., & Yucel, R. (2010). Building sustainable expertise in marking: integrating the moderation of first year assessment. Paper presented at the ATN Assessment conference University of Technology Sydney. http://www.iml.uts.edu.au/atnassessment/poster.html Bloxham, S. (2009). Marking and moderation in the UK: false assumptions and wasted resources. Assessment and Evaluation in Higher Education, 34(2), 209-‐220. doi: 10.1080/02602930801955978 Bloxham, S., & Boyd, P. (2007). Developing effective assessment in higher education: A practical guide. Maidenhead: Open University Press. Bloxham, S., Boyd, P., & Orr, S. (2011). Mark my words: the role of assessment criteria in UK higher education grading practices. Studies in Higher Education, 36(6), 655-‐670. doi: 10.1080/03075071003777716 Bushell, G. (2006). Moderation of peer assessment in group projects. Assessment & Evaluation in Higher Education, 31(1), 91-‐108. doi: 10.1080/02602930500262395 Castle, R., & Kelly, D. (2004). International education: quality assurance and standards in offshore teaching: exemplars and problems. Quality in Higher Education, 10(1), 51-‐57. doi: 10.1080/1353832042000222751 Griffith University. (2012). Service Learning, from http://www.griffith.edu.au/gihe/resources-‐ support/service-‐learning Harvey, L. (2004). War of the Worlds: who wins in the battle for quality supremacy? Quality in Higher Education, 10(1), 65-‐71. doi: 10.1080/1353832242000195860 Harvey, L., & Williams, J. (2010). Fifteen Years of Quality in Higher Education. Quality in Higher Education, 16(1), 3-‐36. doi: 10.1080/13538321003679457
Moderation
Centre for Learning and Development
July 2012
Horsburgh, M. (1997). External Quality Monitoring in New Zealand Tertiary Education. Quality in Higher Education, 3(1), 5-‐15. doi: 10.1080/1353832960030102 Horsburgh, M. (1998). Quality Monitoring in Two Institutions: a comparison. Quality in Higher Education, 4(2), 115-‐135. doi: 10.1080/1353832980040203 Hughes, G. (2011). Towards a personal best: a case for introducing ipsative assessment in higher education. Studies in Higher Education, 36(3), 353-‐367. Hunter, K., & Docherty, P. (2011). Reducing variation in the assessment of student writing. Assessment & Evaluation in Higher Education, 36(1), 109-‐124. doi: 10.1080/02602930903215842 James, R. (2003). Academic standards and the assessment of student learning: Some current issues in Australian higher education. Tertiary Education and Management, 9(3), 187-‐ 198. doi: 10.1080/13583883.2003.9967103 Johnston, L., & Miles, L. (2004). Assessing contributions to group assignments. Assessment & Evaluation in Higher Education, 29(6), 751-‐768. doi: 10.1080/0260293042000227272 Kuzich, S., Groves, R., O'Hare, S., & Pelliccione, L. (2010). Building team capacity: sustaining quality in assessment and moderation practices in a fully online unit. Paper presented at the ATN Assessment Conference, University of Technology Sydney. Lawson, K., & Yorke, J. (2009). The development of moderation across the institution: a comparison of two approaches. Paper presented at the ATN Assessment Conference, RMIT University. http://emedia.rmit.edu.au/conferences/index.php/ATNAC/ATNAC09/schedconf/prese ntations Lim, F. C. B. (2008). Understanding quality assurance: a cross country case study. Quality Assurance in Education, 16(2), 126-‐140. doi: 10.1108/09684880810868411 New Zealand Qualifications Authority. (1992). Moderation of Assessment: An Introduction for National Standards Bodies. O'Rourke, S., & Al Bulushi, H. A. (2010). Managing Quality from a Distance: A Case Study of Collaboration Between Oman and New Zealand. Quality in Higher Education, 16(3), 197-‐ 210. doi: 10.1080/13538322.2010.506699 Orr, S. (2007). Assessment moderation: constructing the marks and constructing the students. Assessment & Evaluation in Higher Education, 32(6), 645-‐656. doi: 10.1080/02602930601117068 Sadler, D. R. (2009). Grade integrity and the representation of academic achievement. Studies in higher Education, 34(7), 807-‐826. doi: 10.1080/03075070802706553 Sanderson, G., Yeo, S., Thuraisingam, T., Briguglio, C., Mahmud, S., Singh, P. H., et al. (2010). Interpretations of Comparability and Equivalence around Assessment: Views of Academic Staff in Transnational Education. Paper presented at the Australian Quality Forum: Quality in Uncertain Times, Gold Coast, Australia. Service Learning Australia. (2012). Service Learning Australia, from http://www.servicelearning.org.au/ The Scottish government. (2011). Curriculum for excellence Building the curriculum 5 a framework for assessment Van der Schaaf, M., Baartman, L., & Prins, F. (2011). Exploring the role of assessment criteria during teachers’ collaborative judgement processes of students’ portfolios. Assessment & Evaluation in Higher Education, 1-‐14. doi: 10.1080/02602938.2011.576312 Yorke, M. (2010). Summative assessment: dealing with the 'measurement fallacy'. Studies in Higher Education, 36(3), 251-‐273. doi: 10.1080/03075070903545082
Moderation
Centre for Learning and Development
July 2012