Assessment in the Age of AI
Assessment in Online Courses in the Age of AI Agents
University of Georgia • Office of Online Learning • March 2026
UGA’s QEP: UGA’s Quality Enhancement Plan focuses on building a community that embraces active learning. AI-era assessment reform supports that goal directly. The approaches in this handout require students to demonstrate understanding by doing, reflecting, and engaging — not by retrieving information. Responding to AI by redesigning assessment is an opportunity to advance UGA’s active learning commitments.
Before Assessment: Designing Courses That Require Engagement
These course design strategies create the conditions that make assessment more meaningful and AI circumvention less rewarding and they are consistent with UGA’s active learning QEP.
Assessment in Online Courses in the Age of AI Agents
University of Georgia • Office of Online Learning • March 2026
Framework: Assessment Strategies by Course Context
Circumvention risk reflects the combination of stakes (grade/requirement pressure) and student interest. High-stakes, low-interest required courses — such as CBK requirements — represent the highest risk. Strategies in those cells address both resistance to AI circumvention and relevance-building, since motivation to circumvent decreases when students see the purpose of the work.
| Course Type → Class Size ↓ |
High-Stakes / Low-Interest Gen Ed Required core; CBK-type; diverse populations |
Major / Program Course UG and graduate/professional; discipline-specific mastery |
Exploratory / Elective Gen Ed Student-chosen; moderate intrinsic interest |
|---|---|---|---|
| Small (<30) |
▲ High Risk Oral discussion checkpoints tied to submitted work — confirm ownership and raise relevance through conversation. Locally-situated case studies connecting required content to students’ own programs or career goals. Design goal: make the “why” visible. Circumvention drops when students see personal stakes. |
▼ Lower Risk Oral exams / vivas — assess depth; AI cannot substitute for the student. Iterative projects with staged deliverables and instructor feedback loops. Co-scaffolded AI tasks — direct AI to solve a disciplinary problem, then critically evaluate its output against professional standards. AI-use policy: discipline-normed; mirrors professional practice. Graduate programs should reflect field-specific AI norms. |
▬ Moderate Risk Reflective portfolios documenting growth; authentic voice is hard to fabricate over time. Student-directed inquiry projects anchored in genuine curiosity. AI-use policy: transparent co-use with required process reflection. |
| Medium (30–100) |
▲ Highest Risk Staged group projects with individual reflection at each stage — separates process from product; harder to fully outsource. Scenario-based application quizzes using novel, locally-relevant contexts each term. Design goal: peer accountability raises engagement in required courses. TA-supported rubrics are essential. |
▼ Lower Risk Authentic case-based assessments with novel parameters each term. Staged groupwork — shared deliverable plus individual reflection at each stage. AI literacy documentation — students submit a log of AI interactions alongside their work, evaluated for quality of prompting and critical judgment. AI-use policy: require process documentation; AI as tool, not author. |
▬ Moderate Risk Peer-reviewed creative or analytical projects with structured reflection. Redesigned open-note assessments emphasizing judgment and synthesis. AI-use policy: define permitted tools; assess metacognitive reflection separately. |
| Large (>100) |
▲ Highest Risk Embedded synchronous checkpoints — brief live or recorded responses confirm engagement. AI-evaluation tasks — students critique or correct AI-generated responses using course concepts; assesses mastery through discernment, not production. Peer-assessed process portfolios using structured rubrics — scalable; shifts focus to learning over product. Design goal: anonymous large required courses are highest circumvention risk at UGA. Faculty workload is critical. |
▬ Moderate Risk Proctored or synchronous capstone assessments at key milestones. AI-integrated problem sets where students extend or evaluate AI-generated work against disciplinary standards. AI-use policy: structured transparency; require documentation of AI interaction. Graduate programs: align with professional licensure and accreditation expectations. |
▬ Moderate Risk Peer-assessed portfolios with structured rubrics — scales reflection without full faculty grading load. Application-focused redesigned tests with open-AI transparency requirements. AI-use policy: focus on process artifacts alongside final product. |
Assessment in Online Courses in the Age of AI Agents
University of Georgia • Office of Online Learning • March 2026
Assessment Approaches: Key Features
Assessment in Online Courses in the Age of AI Agents
University of Georgia • Office of Online Learning • March 2026
Managing the Grading Load
- Apply rubrics at scale. Paste a rubric and a batch of student submissions into an AI tool and ask it to score each against the rubric criteria. Review and adjust — do not accept scores without verification.
- Generate first-draft feedback. Ask AI to draft individualized feedback for each submission based on the rubric. NotebookLM is particularly useful here — upload your rubric and student submissions as source documents and prompt it to draft feedback anchored to your rubric criteria. Because NotebookLM grounds responses in uploaded sources, feedback stays tied to your rubric language rather than generating generic comments. Edit for accuracy and tone before returning to students.
- Summarize long submissions. For portfolios or reflective assignments, use AI to produce a summary of each student’s key claims before reading in full. Prioritize where to focus close reading.
- Flag outliers. Ask AI to identify submissions that are significantly stronger, weaker, or inconsistent with prior work — useful for targeting grading attention and detecting potential integrity issues.
- Check for consistency. After grading a sample manually, ask AI to apply the same standards to the remainder and compare results. Use discrepancies to recalibrate.
- Generate common feedback themes. After grading, ask AI to analyze all submissions and identify the most common errors or gaps. Use this to inform a class-wide response rather than repeating the same feedback individually.
- Peer evaluation. Have students use a UGA AI tool to evaluate peers’ products or processes, then score how well the AI did. To assess the AI’s performance, students must do the underlying evaluation themselves — making further delegation circular and self-defeating. Faculty spot-check a sample of peer grades rather than grading every submission. For this to work, rubric criteria must require discipline-specific judgment, contextual reasoning, or evaluation of process rather than product. Rubrics that read like checklists are effectively multiple choice tests that AI can navigate by pattern-matching.
- Group work with individual reflection. Grade the group deliverable once and the individual reflections separately. One product review covers multiple students; the reflection grades are short and structured.
- Sampling strategies. In large courses, randomly select a subset of submissions to grade in depth each week. Communicate this to students — the possibility of being selected maintains engagement without requiring full coverage every time.
- TA calibration. Before TAs begin grading, have them independently score the same 3–5 submissions and compare results. Invest time in calibration upfront to reduce inconsistency and rework later.
- Grade checkpoints, not just final products. Brief staged check-ins are faster to grade than full submissions and catch problems early. Feedback at stage two reduces the volume of revision at stage four.
- Use completion grading for low-stakes work. Reserve detailed rubric grading for high-stakes submissions. Low-stakes checks can be graded on completion, with AI flagging anything that appears off-task and creating reports about student response patterns.
- Build reusable rubrics. Invest time once in building a well-structured rubric that AI can apply consistently across terms. A rubric that works with AI grading tools saves significant time at scale.
Assessment in Online Courses in the Age of AI Agents
University of Georgia • Office of Online Learning • March 2026
Actionable Recommendations for UGA
- Create context-specific guidance organized by class size and course type — not a single blanket policy.
- Provide worked examples from multiple disciplines, including graduate and professional programs, through the CTL.
- Expand CTL course redesign institutes to include AI-era assessment design.
- Build peer networks of faculty who have successfully redesigned assessments.
- Prioritize large-enrollment gen ed instructors — highest impact, highest vulnerability.
- Invest in TA training and rubric infrastructure so that stronger assessment approaches do not simply increase individual faculty burden. Workload needs to be addressed structurally.
- Consider course release, stipends, or other incentives for faculty undertaking significant assessment redesign.
- Portfolio and reflective assessments feed directly into UGA’s Comprehensive Learner Record.
- Align assessment redesign incentives with CLR adoption goals.
- AI-era assessment redesign advances UGA’s active learning QEP — both require students to engage, apply, and reflect rather than retrieve.
- Co-scaffolded AI literacy is a transferable competency that fits within the CLR framework. Students who can critically direct and evaluate AI in a disciplinary context are more workforce-ready.
- Build flexibility into new assessment models from the start and coordinate with Student Affairs and Disability Services before rollout.
- Assessment redesign should not disadvantage working students, caregivers, or students across time zones.
- Include students in developing AI-era assessment norms — not just in evaluating them after the fact.
- Some students may be conscientious AI objectors; please consider alternative assignments for that group.
- Identify 2–3 programs to pilot redesigned assessment approaches in AY 2025–26, spanning undergraduate and graduate contexts.
- Adapt UGA’s Office of Assessment methods to evaluate impact on student learning outcomes.
- Share results. Our scale and infrastructure position us to shape national and system-wide practice.
Assessment in Online Courses in the Age of AI Agents
University of Georgia • Office of Online Learning • March 2026
A Constructive Vision: The Learning Agent
UGA Online has developed a prototype that uses AI as the assessment engine itself. The Learning Agent:
- Ingests the instructor’s learning outcomes for a course
- Allows the instructor to specify what mastery looks like in terms of both knowledge and observable student behavior
- Engages the student in a Socratic dialogue that probes depth of understanding rather than surface recall
- Documents the conversation and identifies evidence of mastery, flagging specific student responses that demonstrate outcome achievement
- Submits a summary, highlights, and full transcript to a course dropbox for faculty verification — the instructor makes the final determination
Example: A student selects the outcome “evaluate the quality and credibility of data-based claims.” The agent presents a headline — “Product X reduces cold duration by 50%” — and works through sample size, generalizability, methodology, and corroboration with the student. The agent follows the student’s reasoning and probes further at each step. A student delegating to another AI cannot navigate the conversation without genuine understanding of the material.
View the proof of concept: https://kaltura.uga.edu/media/t/1_mapn0z0w
Why this matters: The Learning Agent addresses the highest-risk cells in the matrix — particularly large, high-stakes, low-interest required courses — but works across all three course types and enrollment sizes, including graduate and professional programs. It scales assessment without proportionally increasing faculty grading time and generates verifiable evidence of student thinking that supports both the QEP and the CLR. Assessment validity and reliability will need to be established as the tool matures, including consistency across diverse student populations.
Discussion Questions
- Where on the matrix is UGA most vulnerable right now? Where are we already well-positioned?
- How should assessment design differ between undergraduate, graduate, and professional programs at UGA?
- What do faculty need — in time, support, or incentives — to redesign assessments in large gen ed online courses?
- How do we ensure that assessment redesign does not disadvantage working students, caregivers, and non-traditional students?
- How do we define AI literacy as a disciplinary competency — and should it look different in a biology course than a business course?
- What role should students play in shaping AI-era assessment norms at UGA?
- What would faculty need to trust and adopt a tool like the Learning Agent — and where would it have the most immediate impact?