UoM Personal Page

Stephen Wheeler | eLearning Technologist

Reimagining Assessment in Practice

Published on (2025-10-04) by Stephen Wheeler.

Retro sci-fi scene: analog control room fades into a small group sharing papers beneath a constellation, hinting at a shift from machine metrics to human dialogue and trust.

Introduction

The contemporary university increasingly entrusts its judgements to machines. Dashboards and predictive analytics promise objectivity, but in doing so they displace the interpretive and dialogic work that once defined teaching. The educator’s professional discernment, once exercised through conversation, experience, and trust, is now mediated through the abstractions of data. As Williamson, Bayne, and Shay (2020) observe, the datafication of teaching in higher education reconfigures academic practice by translating pedagogical activity into forms that can be monitored, compared, and governed. What appears as neutral measurement is in fact a sociotechnical process that embeds institutional and commercial priorities within everyday pedagogical practice. Data infrastructures, they argue, do not simply record practice, they reshape it, shifting attention from relational and interpretive forms of evaluation toward managerial oversight. Selwyn (2019) similarly notes that automation in education rarely neutralises bias; instead, it reproduces institutional priorities of efficiency, control, and surveillance. The result is a narrowing of educational purpose, where quality becomes synonymous with what can be counted.

This data-driven orientation is visible across higher education. Learning management systems quantify engagement, AI graders promise speed and consistency, and plagiarism detectors and remote proctoring technologies claim to safeguard integrity while eroding trust. Even formative assessment, once a dialogic process, has been reimagined as a set of metrics and dashboards. As Bayne et al. (2020) argue in The Manifesto for Teaching Online, such technologies do not merely support teaching but actively shape it, embedding assumptions about efficiency, standardisation, and surveillance into pedagogical practice. In privileging data capture and visibility, these systems reconfigure teaching as information management and learning as compliance. What emerges is an epistemology of metrics (Biesta, 2010), where measurement stands in for meaning and numerical transparency replaces interpretive judgement. The automation of assessment, in this sense, is not merely technical but epistemic. It redefines what counts as knowledge, success, and integrity in education.

Yet to resist this trend does not mean rejecting technology outright. The question animating this post is therefore not whether to use automation but how to design assessment practices that remain human, interpretive, and relational within automated systems. The challenge is ethical and political as much as pedagogical: how can educators sustain spaces for dialogue and creativity in institutions structured by audit and compliance? Feenberg (2017), through his concept of democratic rationalisation, reminds us that technologies are not destiny but sites of contestation, open to reinterpretation and redesign through collective human agency. The task, then, is to reappropriate assessment, working with institutional technologies while preserving the educator’s capacity for judgement, care, and shared responsibility.

Assessment must be reclaimed as a moral and imaginative practice, not merely a mechanism of measurement. It invites educators and students alike to participate in meaning-making: to interpret, critique, and create. In reasserting judgement as an educational virtue, we resist the quiet erosion of pedagogy by automation. This post develops that argument in three movements. First, it proposes criteria for AI-resilient assessment, forms of evaluation that retain meaning even in the presence of generative automation. Second, it presents examples from practice: dialogic assessment, project-based work, collaborative inquiry, and reflective portfolios that foreground process and interpretation. Third, it considers institutional tactics for navigating quality assurance regimes, resisting surveillance, and documenting pedagogical rationale as acts of professional agency. Together these strands suggest that assessment, when enacted as a space of ethical judgement, can become a site of tactical yet profound resistance within the automated university.

The Automation of Judgement

Digital assessment infrastructures are built on the logics of scalability, standardisation, and efficiency. In privileging throughput and consistency, they translate the interpretive work of evaluation into algorithmic classification. Automated tests, learning analytics dashboards, and similarity-detection tools render performance visible through numbers, patterns, and alerts. Interpretation and dialogue give way to pattern recognition. The human act of discerning meaning is replaced by computational correlation. This is not simply automation of grading, it is the automation of pedagogy itself, where the capacity for human judgement is displaced by the mechanical pursuit of comparability. Such infrastructures embody what Williamson, Bayne, and Shay (2020) describe as the datafication of teaching, where judgement is reframed as data management.

The adoption of artificial intelligence has accelerated this shift. Automated essay-scoring systems promise fast, objective grading but are limited by their reliance on surface features such as word frequency, sentence length, and syntactic regularity. They reward formulaic writing while penalising ambiguity, creativity, or disciplinary nuance (Bennett, 2011; Rotou and Rupp, 2020; Perelman, 2014). As Perelman (2014) argues, when “the state of the art” is counting words, systems reproduce superficial proxies for writing quality. Even small textual alterations can mislead algorithms, exposing the fragility of machine judgement (Filighera et al., 2022). Plagiarism-detection software translates questions of originality into percentages of textual similarity, narrowing interpretation to thresholds and alerts. Remote-proctoring systems extend automation into the body: webcam feeds and behavioural analytics construct categories of “suspicious” subjects, transforming trust into surveillance. Swauger (2020) shows how such systems encode the body as data, reproducing bias and coercion under the guise of integrity, while Silverman et al. (2021) demonstrate that closing the door on remote proctoring enables more authentic, people-centred forms of assessment. Together they reveal that proctoring technologies do not merely secure integrity, they redefine it, shifting responsibility from the pedagogical relationship to the technical apparatus. The educator’s trust in students is reconstituted as the institution’s trust in code.

The consequences for students are profound. Automated feedback and rubric-driven evaluation compress learning into procedural compliance. Students learn to produce what the system recognises rather than what inquiry demands. Risk-taking, experimentation, and reflection give way to the optimisation of machine-legible performance. In this environment, integrity becomes a contractual obligation rather than a shared ethical commitment. Surveillance cultivates suspicion: students perform innocence while educators manage data, not relationships. The relational fabric of education, its trust, dialogue, and mutual recognition, frays, and assessment becomes less a conversation than a mechanism of control.

Yet the crisis at the heart of this transformation is not technological but pedagogical. The automation of judgement reflects a loss of confidence in the educator’s capacity to interpret, decide, and justify. Institutions turn to algorithmic proxies because they promise precision where human reasoning is messy, fallible, and slow. But as Biesta (2013) reminds us, education’s beauty lies precisely in its contingency, the unpredictable, interpretive encounter through which meaning is made. By externalising judgement, we abdicate this risk and the ethical responsibility it entails. Feenberg’s (2017) notion of democratic rationalisation offers an alternative: rather than accepting technological determinism, educators can reappropriate digital systems in ways that sustain human discernment. The automation of judgement is not the end of teaching but its mirror, showing what must be reclaimed if education is to remain a human, interpretive practice.

Principles for AI-Resilient Assessment

If the automation of judgement has narrowed assessment to what can be counted, then AI-resilient assessment reclaims what cannot be reduced to prediction or replication. In a context where generative tools can produce fluent academic prose on demand, assessment must remain meaningful, distinctive, and ethically grounded. AI-resilient assessment refers to work that endures as human, where reasoning, dialogue, and interpretation are integral to what is being assessed. It is not about defeating automation, but about designing learning that resists being co-opted by it. Such assessments value the interpretive, the situated, and the relational over the standardised and the procedural. Where automation tends to standardise and predict, AI-resilient assessment recentres interpretive judgement and shared meaning.

Four pedagogical orientations support this aim.

  1. Judgement over prediction
    AI-resilient assessment privileges qualitative reasoning, contextual awareness, and justification. The focus shifts from producing a correct answer to articulating how and why a position is taken. This demands analysis, evaluation, and creation, the higher-order processes described by Anderson and Krathwohl’s (2001) revision of Bloom’s taxonomy. Boud and Molloy (2013) argue that genuine feedback and evaluative judgement arise from dialogue and self-assessment, not from algorithmic comparison. Designing for judgement asks students to interpret complexity, weigh evidence, and defend their reasoning, tasks that resist automation because they demand situated understanding rather than pattern matching.
  2. Authenticity over replication
    Authentic tasks locate assessment within real-world, disciplinary, or personally meaningful contexts. They invite students to perform knowledge rather than simply report it, drawing on professional and civic practices. Ashford-Rowe, Herrington, and Brown (2013) identify several critical elements of authentic assessment, contextualisation, complexity, collaboration, reflection, and transferability, that require students to apply ideas in ways that mirror professional practice. These elements make performance in context the evidence of learning, not mere reproduction. Designing policy briefs, composing reflective artefacts, or developing research proposals all foreground creativity and relevance, rendering automation less effective because the work depends on judgement, dialogue, and situated meaning rather than replication.
  3. Relationality over isolation
    Learning and assessment gain meaning through dialogue, collaboration, and feedback. Carless and Boud (2018) describe feedback literacy as the capacity to interpret and act on feedback within an ongoing conversation. Relational assessment foregrounds this reciprocity: drafts, peer reviews, and iterative submissions make learning visible as a process of mutual interpretation. Boud and Soler (2015) argue that cultivating students’ evaluative judgement requires collective sense-making and trust. Nieminen (2024) extends this argument by framing assessment as an inclusive practice that recognises difference and participation as conditions of learning rather than deviations from it. Such approaches decentralise authority, fostering communities of inquiry where assessment is enacted with students rather than on them, an inherently AI-resistant orientation because it relies on human presence and negotiation.
  4. Transparency over opacity
    Students should understand why assessment takes the form it does and how criteria link to disciplinary values. Transparency builds trust and enables ethical participation. Winkelmes et al. (2016), reporting on the Transparency in Learning and Teaching (TILT) initiative, found that making the purpose, task, and criteria of assignments explicit improved engagement and success, particularly for first-generation and underserved students. Their findings demonstrate that transparency is not a bureaucratic gesture but a form of educational justice: when expectations are clarified, equity increases. Balloo et al. (2018) similarly show that transparent assessment promotes self-regulation and motivation. Transparency does not mean simplification; it means revealing the reasoning behind assessment choices so that students can align their work with the epistemic values of the field. Making assessment design explicit demystifies evaluation and reinforces integrity as a shared commitment.

These principles are not technical fixes but pedagogical orientations, ways of designing for integrity rather than defending against misconduct. In practice, they encourage educators to integrate reflective rationales, scaffolded milestones, peer dialogue, and multimodal forms of expression. Such designs foreground process, not just product, and invite students into co-creation. AI-resilient assessment, then, is not defined by its immunity to technology but by its fidelity to learning as a relational and interpretive act. It endures because it asks what machines cannot: to judge, to care, and to make meaning together.

Examples from Practice

The following designs illustrate how assessment can be enacted as dialogue, inquiry, and reflexive judgement rather than as one-shot performance. Each approach foregrounds human interpretation and agency while making reasoning and process visible, features that resist the logic of automation and prediction. The aim is not to prescribe methods but to demonstrate pedagogical patterns that maintain integrity and meaning in an AI-saturated environment. The digital tools mentioned are illustrative and should be understood as contextual supports, not technical solutions.

Dialogic assessment

Structured feedback conversations (teacher-student and peer-peer), annotated drafts, and response memos make process assessable. Students submit an artefact with a short justification (“what I changed and why”), then discuss feedback and set revision goals; subsequent submissions are judged partly on how well students acted on feedback. This builds feedback literacy, the capacity to interpret and use critique, as a learning outcome in its own right (Carless and Boud, 2018; Boud and Molloy, 2013). Dialogic designs make judgement visible: students justify choices, weigh trade-offs, and rehearse disciplinary standards (Nicol and Macfarlane-Dick, 2006; Sadler, 1989). Because reasoning and revision are foregrounded, generative systems cannot easily substitute for this interpretive exchange.

Project-based work

Open-ended design or inquiry projects ask students to pose questions, gather data, iterate prototypes, and present defensible solutions. The emphasis is on transfer, applying ideas in new and messy contexts (Blumenfeld et al., 1991), and on making reasoning visible through milestones such as proposal, interim brief, and final rationale. Since outcomes emerge through collaborative iteration, generative tools are of limited use: students must justify distinctive design decisions and defend methodological coherence. Assessment criteria focus on problem framing, evidence-based reasoning, and adaptability, dimensions of work that depend on human judgement, not computational efficiency.

Collaborative inquiry

Collaborative inquiry reimagines group work as collective reasoning and accountable authorship. Students investigate a shared problem by distributing roles (literature lead, data wrangler, methods coordinator, synthesis writer) and maintaining a public research log. The focus is on sense-making within a knowledge-building community (Scardamalia and Bereiter, 2006). Assessment values both the shared outcome, such as a living review or open dataset, and the reflective commentary that evidences participation and negotiation. In this model, understanding is co-produced, and AI-generated output provides little advantage: the educational value lies in deliberation, disagreement, and interpretive consensus.

Reflective portfolios

Multimodal portfolios curate work over time, drafts, notes, feedback, artefacts, accompanied by critical commentary that narrates growth in judgement and criteria-use. Rather than a final snapshot, the portfolio becomes a narrative of becoming: students analyse their development, identify patterns, and justify learning choices. Research shows that well-designed ePortfolio pedagogies enhance agency and integrative learning by connecting artefacts to purpose and audience (Eynon and Gambino, 2017; Boud, Keogh and Walker, 1985). The capacity to reflect critically on process and value decisions remains distinctively human and resistant to automation.

Low-surveillance tools and workflows (illustrative)

Tool choices should align with local ethics and data governance; the examples below prioritise privacy, versioning, and shared authorship.

  • Dialogue and annotation: course LMS discussions configured without analytics dashboards; Hypothes.is for social annotation; Discourse forums (self-hosted).
  • Drafts and versioning: GitLab CE or Gitea for project histories; Nextcloud/OnlyOffice for local collaborative writing.
  • Studio and project spaces: Jupyter Book or Quarto for research notebooks; static-site generators (e.g., Hugo) for publishable briefs.
  • Portfolios: Mahara (open-source) or locally hosted web pages that prioritise privacy and learner ownership.
  • Peer review logistics: Moodle Workshop (double-blind) or structured peer-review forms collected via shared repositories.

Across these examples, the design logic is consistent: make reasoning, process, and relationships central to what is assessed. This orientation resists automation precisely because it demands situated judgement, negotiated meaning, and accountable authorship, qualities that define education as a profoundly human practice.

Institutional Tactics and Constraints

Designing assessments that foreground judgement, dialogue, and authenticity often runs counter to institutional demands for standardisation, auditability, and quality assurance. Frameworks for moderation and validation, along with external accreditation and metrics, often prioritise consistency and comparability over complexity and interpretation. As AI systems promise greater efficiency in grading and assurance, institutions may tighten procedural control, reinforcing the very logics that authentic assessment seeks to resist. Yet even within such regimes, educators can adopt tactical moves that preserve pedagogical integrity while navigating accountability structures.

First, educators can speak the institution’s language. Quality assurance frameworks such as the UK Quality Code for Higher Education (QAA, 2018) and Advance HE’s guidance on assessment emphasise constructive alignment, transparency, and student feedback. By embedding authentic and dialogic tasks within these established principles, innovation can appear as alignment rather than deviation. For example, a reflective portfolio can be shown to meet programme learning outcomes, support feedback literacy, and align with graduate attributes. Using institutional terminology strategically transforms critique into contribution.

Second, it is crucial to document pedagogical rationale in validation, periodic review, or re-accreditation processes. Rather than concealing deviation from standard formats, articulate how authentic tasks promote deeper learning and evaluative judgement. Evidence from pilot studies, student reflections, and rubric analysis can show that dialogic and project-based assessments fulfil expectations of fairness and reliability while enhancing engagement. Framing such work through the Scholarship of Teaching and Learning (SoTL) strengthens its legitimacy: scholarly reflection becomes part of institutional knowledge, not individual dissent.

Third, educators can push back against AI proctoring and automated grading by emphasising trust, equity, and academic labour. AI proctoring tools have been shown to reproduce bias and erode privacy, transforming trust into surveillance (Selwyn et al., 2021; Coghlan, Miller and Paterson, 2021). Automated grading similarly displaces interpretation with metrics, prioritising surface correctness over meaning (Mita, 2021). When such systems are proposed, faculty can demand ethical review, request opt-outs, and foreground the relational foundations of integrity. As Logan (2021) argues, reclaiming dignity and mutual trust in assessment is an act of pedagogical justice.

Fourth, build collective capacity. Individual resistance is precarious, but collaborative action can reshape norms. Co-writing guidance documents, sharing exemplars, or co-authoring position papers situates authenticity within institutional learning rather than rebellion. Communities of practice (Wenger, 1998) can interpret policies collectively, providing a shared vocabulary for incremental change. By documenting rationales and presenting outcomes, groups can demonstrate that authenticity is not an indulgence but an expression of institutional integrity.

Finally, tactical acts accumulate. Pilot an alternative assessment within existing rubrics; include a reflective component alongside standard exams; document results in annual monitoring reports. Bearman et al. (2016) present the Assessment Design Decisions Framework, a practical tool for documenting and justifying assessment choices within institutional processes, supporting sustainable, evidence-based design under quality assurance constraints. Each successful example expands what internal reviewers and external examiners perceive as legitimate. Over time, small acts of alignment and evidence can loosen bureaucratic rigidity, making room for authentic, dialogic assessment to flourish within the system rather than despite it.

These tactics do not guarantee transformation, but they open spaces of agency within compliance. Through strategic alignment, scholarly documentation, ethical resistance, and collective imagination, educators can shift institutional norms toward forms of assessment that honour judgement, care, and shared responsibility, even under the pressures of automation.

Tools and Techniques for Implementation

Translating the principles of AI-resilient assessment into practice requires tools and workflows that sustain human interpretation, trust, and care. The goal is not to identify “AI-proof” software but to configure existing technologies in ways that foreground dialogue, reflection, and qualitative reasoning. As Selwyn (2022) observes, the central question is not whether education uses technology but how we do so, and whose values those technologies serve. In the context of generative AI, the task is to design assessment ecologies that remain meaningful even when automation is pervasive, systems that sustain interpretation rather than replace it.

Low-surveillance and open tools

Open-source or institutionally hosted platforms such as Mahara, WordPress, or Nextcloud support reflective portfolios and multimodal artefacts that foreground student authorship and ownership (Eynon and Gambino, 2017). Lambert (2019) emphasises that such open and blended environments can widen participation when they are designed across six interrelated dimensions - purpose, autonomy, social support, technology, materials, and skills - creating inclusive ecologies for learning and assessment. These systems enable learners to curate evidence of growth, embed peer feedback, and link creative work to disciplinary standards. Peer-review tools such as Moodle Workshop, Praze, or Peergrade can be configured to anonymise review and emphasise qualitative commentary over numerical scoring. For dialogic assessment, collaborative annotation environments such as Hypothes.is or self-hosted Discourse forums facilitate discussion around shared artefacts without the extractive data practices of commercial analytics suites. Caines and Silverman (2021) warn that academic surveillance technologies often enter campuses through opaque vendor relationships; low-surveillance spaces and local stewardship help preserve trust.

Human-centred digital workflows

Digital assessment does not need to replicate managerial dashboards. Simple, human-centred workflows, version control through GitLab CE or Gitea, shared repositories in Nextcloud, or collaborative authoring in OnlyOffice, make process and revision visible while maintaining educator agency. These approaches echo Boud and Molloy’s (2013) concept of sustainable assessment, where learners and educators co-create evaluative capacity over time. Git-based workflows, for instance, preserve a transparent record of change, allowing reflection on reasoning and feedback uptake rather than automated grading. Structured peer-feedback loops, using discussion boards or iterative drafts, build relational accountability and feedback literacy (Carless and Boud, 2018). Such practices centre interpretation and reciprocity, qualities that resist automation because they rely on mutual understanding, not pattern recognition.

Designing rubrics and feedback models

Rubrics can reinforce mechanistic evaluation if overly prescriptive, yet when designed transparently they sustain interpretive judgement. Dawson (2017) argues that well-constructed rubrics can clarify evaluative standards while preserving the educator’s space for professional interpretation, promoting both transparency and consistency without reducing complexity. Nicol (2021) demonstrates that students generate valuable “internal feedback” when they compare their own work against exemplars or peers’ outputs, using natural comparison processes to evaluate quality and identify improvements. Designing rubrics that describe qualities of reasoning, clarity of argument, integration of perspectives, responsiveness to feedback, supports these comparison processes and encourages students to interpret meaning collaboratively with educators. Narrative or audio feedback tools such as Kaizena or Vocaroo can further humanise assessment by conveying tone and care, supporting what Ice et al. (2007) call “social presence” in online feedback.

Accessibility, sustainability, and workload

AI-resilient assessment must also be viable and inclusive. Open-source systems and simple web tools reduce licensing costs and allow accessibility adjustments such as captioning, alt-text, or plain-language prompts. Accessibility also entails designing tasks that respect cognitive diversity and multiple means of expression, consistent with Universal Design for Learning (CAST, 2018). Sustainability involves not only ecological and financial factors but also educator workload. As Bearman et al. (2017) show, assessment design in higher education is shaped by complex negotiations between pedagogical intent, institutional constraints, and available tools. Designing for reuse, through shared templates, modular rubrics, and replicable peer-review processes, can help distribute effort across cohorts and contexts. When tools are interoperable and data-portable, educators can adapt practices without vendor lock-in, preserving autonomy and equity.

Open sharing and communities of practice

Authentic assessment practices gain resilience when shared as open educational resources (OERs). Publishing assignment briefs, rubrics, or feedback guides under Creative Commons licences invites adaptation and dialogue (Cronin and MacLaren, 2018). Such openness cultivates communities of practice (Wenger, 1998) where educators exchange exemplars and refine practice collectively. By treating assessment designs as living pedagogical artefacts rather than proprietary templates, institutions model integrity through openness and contribute to a shared culture of care, collaboration, and ethical engagement, values that make assessment enduringly human.

Reflection and Conclusion

Assessment is not merely a mechanism for measuring learning; it is both a mirror of what a community values and a maker of what it becomes. It reflects institutional priorities but also constructs them, shaping how learners see themselves and what kinds of knowledge are deemed legitimate. Where automation narrows judgement to what is countable, it projects a vision of education governed by efficiency rather than understanding. To reclaim assessment is therefore to reclaim pedagogy itself: to insist that evaluation remains interpretive, dialogic, and situated in relationships of trust, care, and institutional purpose.

Resisting the automation of judgement is not nostalgia for a pre-digital age but an affirmation of pedagogical integrity. Biesta (2010) reminds us that the task of education lies not in producing measurable outcomes but in subjectification, the process through which learners become responsible and discerning agents rather than compliant performers. When teachers design assessments that invite interpretation rather than prediction, they create space for this subjectivity to emerge. Such practices enact what Arendt (1958) called natality: the capacity to begin something new, to bring fresh meaning into the world through action and dialogue. Each interpretive act in assessment is a small exercise of natality, a beginning that affirms human agency within systems increasingly oriented toward prediction. In contrast to the logic of the machine, the educator’s task is not to outpace technology but to humanise it, ensuring that learning remains an encounter rather than an extraction.

Far from fragile, this human dimension forms the foundation of educational integrity. As Boud and Falchikov (2006) argue, sustainable assessment equips students with the capacity to evaluate their own work long after formal study ends. Automation may simulate evaluation, but it cannot cultivate discernment. What matters most in assessment, the ability to weigh evidence, justify a position, and recognise another’s perspective, depends on the slow, interpretive work of conversation. When institutions privilege metrics over meaning, they risk eroding precisely the judgement they claim to measure. To sustain assessment as a human practice is to defend the conditions in which discernment, dialogue, and care can take root.

Every act of assessment is also a moral and political decision. It makes visible what a community honours as achievement and what it neglects. Designing for authenticity and relationality is therefore an ethical stance: it signals trust in students’ interpretive capacity and in educators’ professional wisdom. Feenberg (2017) reminds us that technologies always embody values, yet they can also be redirected through democratic interventions, small, situated acts that bend technical systems toward human ends. Such interventions might take the form of a redesigned rubric, a feedback process that foregrounds dialogue, or a refusal to adopt intrusive proctoring. Each brief, rubric, and exchange can serve as a site of renewal, where educators make their commitments tangible in form and tone.

If assessment is a mirror, it should reflect not compliance but care; not standardisation but shared purpose. The challenge is not merely to resist automation but to design assessment that reveals the kind of learning community we wish to sustain. Each task can embody a vision of education as a shared inquiry into meaning, a counterpoint to the extractive, predictive logics of the automated university.

The next post in this series extends the argument from individual practice to collective design. It explores how courses and programmes can enact these commitments institutionally: aligning assessment strategies with cultures of dialogue, transparency, and shared responsibility. Yet these commitments also point beyond pedagogy to the infrastructures that sustain it. The move toward commons-based and federated forms of digital learning, where resources, platforms, and practices are shared rather than owned, offers a way to materialise the ethics of authenticity and care at scale. Reimagining assessment, in this sense, becomes not only a matter of pedagogy but of institutional and technological imagination.

Bibliography