Merrill's First Principles and the PAICE Assessment: Why problem-centered instruction is what makes behavioral measurement work

Most organizations treat AI training and AI assessment as separate activities. You train people first, then you test them. The training is the instruction. The assessment is the measurement. Two budgets, two timelines, two vendors.

David Merrill's First Principles of Instruction suggest this separation is the problem. Effective instruction is not a sequence of content delivery followed by evaluation. It is a structured experience where the learner solves real problems, activates prior knowledge, observes demonstrations, applies skills with feedback, and integrates what they learned into practice. Assessment is not a separate step. It is woven into the instructional experience itself.

The PAICE (People + AI Collaboration Effectiveness) assessment was not designed by applying Merrill's principles consciously. But when you examine what happens during a PAICE assessment against his framework, the alignment is striking. PAICE works as both measurement and instruction because its architecture satisfies all five principles simultaneously. Understanding this alignment explains why the assessment produces behavioral change in ways that training-then-testing approaches do not.

The Five Principles

In 2002, M. David Merrill published "First Principles of Instruction," a synthesis of decades of instructional design research. Rather than advocate for a single pedagogical theory, Merrill identified the principles that recur across all effective instruction, regardless of the specific methodology used. He found five:

Problem-Centered. Learning happens when learners engage with real, whole problems.
Activation. Learning builds on what the learner already knows.
Demonstration. Learners observe new knowledge and skill applied, not just described.
Application. Learners practice with feedback and diminishing coaching.
Integration. Learners transfer new knowledge into their everyday work through reflection, discussion, and defense.

These are not aspirational ideals. They are empirically observed conditions. Instruction that satisfies all five consistently outperforms instruction that satisfies fewer. The principles are prescriptive, not descriptive: they tell you what to design, not just what to hope for.

What makes Merrill's framework particularly useful for evaluating AI training is its diagnostic power. When a training program fails to produce behavioral change, the principles tell you where it failed. Most AI training programs fail at all five.

Problem-Centered

Merrill's first and most foundational principle: learning is promoted when learners are engaged in solving real-world problems.

Not case studies about problems. Not hypothetical scenarios. Not multiple-choice questions about what you would do in a situation. Real problems, with real stakes, where the learner's response matters.

Most AI training programs violate this principle immediately. They present decontextualized content: slides about responsible AI principles, definitions of hallucination, policy documents about acceptable use. The learner absorbs information about problems other people have had. They do not encounter those problems themselves.

The PAICE assessment is a real problem. When you begin the assessment, you choose a topic from your own professional domain. You then collaborate with an AI assistant on that topic for approximately 25 minutes. The AI is helpful, knowledgeable, and conversational. It is also, at strategic points, wrong.

The errors are not flagged. They are not preceded by a warning or followed by a debrief. They are embedded in the conversation the same way they would be embedded in any real AI interaction: confidently stated, plausibly formatted, easy to accept. Your job is to do what you would do in real work: collaborate effectively, verify when something matters, and catch what needs catching.

This is not a simulation of a problem. It is the problem. The same cognitive challenge that professionals face every day when they use AI tools is recreated under observation. The difference between accepting a hallucinated statistic during a PAICE assessment and accepting one during a client deliverable is context, not cognitive demand.

Activation

Merrill's second principle: learning is promoted when existing knowledge is activated as a foundation for new knowledge.

Effective instruction does not start from zero. It connects to what the learner already knows, uses that knowledge as scaffolding, and builds from there. When instruction ignores prior knowledge, learners either disengage (because the content feels irrelevant) or fail to integrate (because there is nothing to attach the new knowledge to).

Most AI training treats every learner the same regardless of their domain expertise. A compliance officer with 15 years of regulatory experience receives the same "Introduction to AI" module as a recent hire. The training does not leverage the compliance officer's deep knowledge of what correct regulatory analysis looks like. It teaches generic AI concepts and hopes the learner figures out the domain-specific implications on their own.

PAICE activates prior knowledge by design. Because you choose your own professional topic, the assessment is conducted in the domain where your expertise is strongest. When the AI makes an error about contract law, only a lawyer with contract experience will catch it naturally. When it overstates a clinical finding, only a clinician familiar with that literature will notice the inflation.

This is not a design convenience. It is the mechanism that makes failure injection work. PAICE does not test whether you can identify errors in topics you know nothing about. It tests whether you apply the verification habits that your professional expertise makes possible. The assessment activates your existing knowledge and then observes whether you use it.

The Accountability dimension (A, weighted at 30% of your total score) specifically measures this activation. Do you bring your professional judgment to bear on the AI's output, or do you defer to the AI's confident presentation? The assessment can only answer that question because it operates in the domain where your judgment is strongest.

Demonstration

Merrill's third principle: learning is promoted when new knowledge is demonstrated to the learner.

Demonstration is not telling. It is showing. A lecture about how AI can hallucinate is telling. Watching an AI hallucinate in a conversation about your own work is demonstration. The difference is not subtle, and the learning outcomes are not equivalent.

Most AI training is entirely declarative. "AI systems can produce confident but incorrect outputs." "Always verify AI-generated content before using it." "Be aware of potential biases in AI training data." These statements are accurate. They are also inert. Knowing that AI can hallucinate and experiencing an AI hallucination in a conversation where you trusted the output are fundamentally different cognitive events.

During a PAICE assessment, the AI demonstrates the behaviors that matter. It demonstrates overconfidence by presenting uncertain information with authoritative tone. It demonstrates hallucination by generating plausible but fabricated details. It demonstrates subtle error by getting most of an analysis right while embedding a critical mistake in the middle. It demonstrates the Dunning-Kruger pattern by producing polished, well-structured output that sounds more reliable than it is.

The professional does not read about these patterns. They encounter them. In many cases, the encounter is the first time a professional has experienced a specific failure mode in a context where they were paying close attention to the AI's output quality. This is instructionally valuable regardless of the score. The experience of watching yourself accept something you should have caught is a more powerful teacher than any slide deck.

Application

Merrill's fourth principle: learning is promoted when learners are required to use their new knowledge to solve problems, with appropriate feedback.

Application is where most AI training fails catastrophically. A training program might explain verification strategies, show examples, even walk through scenarios. But then the learner returns to their desk and the training is over. There is no structured application. There is no feedback loop. The learner either applies what they learned spontaneously or, more commonly, reverts to prior habits within days.

The PAICE assessment is entirely application. For 25 minutes, the professional applies their collaboration skills in real time. Every response is an act of application: do you verify this claim, challenge that recommendation, catch this error, or let it pass? There is no passive phase. The assessment does not lecture and then test. It places you in the performance environment and observes what you do.

The feedback comes in the score report. Your 0-1000 score across five dimensions (Performance, Accountability, Integrity, Collaboration, Evolution) tells you not just how you did overall, but specifically where your application was strong and where it broke down. The Integrity dimension (I, 25%) captures whether you caught injected errors. The Accountability dimension (A, 30%) captures whether you maintained verification discipline throughout. The Collaboration dimension (C, 20%) captures whether your interaction patterns with the AI were effective.

For professionals who upgrade to PAICE Pro, the feedback deepens. Detailed dimensional breakdowns, specific behavioral observations, and personalized development recommendations provide the coaching layer that Merrill identifies as essential during the application phase. The feedback is not generic. It is tied to your actual performance during the assessment.

This is the application-feedback loop that most AI training programs lack entirely. You applied your skills. Here is what happened. Here is what to work on. Here is how to improve.

Integration

Merrill's fifth principle: learning is promoted when learners are encouraged to integrate new knowledge into their everyday world.

Integration is the hardest principle to satisfy because it extends beyond the instructional event. It requires reflection, discussion, defense of new ideas, and transfer to novel contexts. A single training session rarely achieves integration. A single assessment rarely achieves it either.

PAICE addresses integration through three mechanisms.

Individual reflection. The score report prompts immediate reflection. A professional who scores 580 (Proficient tier) with a strong Performance dimension but a weak Integrity dimension now has specific knowledge about a specific gap. The insight is concrete: "I use AI effectively but I do not catch its errors consistently." That specificity enables targeted behavioral change in a way that a generic "be more careful" admonition cannot.

Organizational discussion. When organizations run AI Capability Baselines, cohort-level results create a shared frame of reference. An L&D team that sees its department scoring in the 55th percentile on Accountability but the 80th percentile on Performance has a concrete, data-driven conversation to have. The discussion is not abstract ("should we do more AI training?") but specific ("our people are productive with AI but they are not verifying outputs at the rate we need").

Longitudinal reassessment. PAICE assessments can be repeated over time. A professional who scores 580 in April, focuses on verification habits, and scores 680 in July has measurable evidence of behavioral change. Organizations that run Baselines quarterly can track trend data at the cohort level. This reassessment cycle is the integration mechanism: new knowledge is applied, measured, refined, and reapplied.

Integration is where the distinction between assessment-as-measurement and assessment-as-instruction becomes most visible. A training program that ends with a knowledge test provides a single data point. An assessment that produces behavioral insight, organizational discussion, and longitudinal tracking provides a development pathway.

Why This Matters for AI Training Programs

Last week we published "Why AI Training Programs Aren't Working", examining the gap between training completion rates and actual behavioral change. Merrill's principles explain why that gap exists.

Most AI training programs violate all five principles simultaneously:

Not problem-centered. Learners study slides, not problems. The training content is decontextualized from their actual work.
No activation. Generic modules ignore the learner's domain expertise. A lawyer and an accountant receive the same content.
No demonstration. AI failure modes are described, not experienced. Learners hear that hallucination is a risk. They do not watch it happen.
No application. There is no structured practice with feedback. Learners absorb information and are expected to apply it on their own.
No integration. The training ends. There is no reflection mechanism, no organizational discussion framework, no reassessment cycle.

PAICE satisfies all five, not because it was designed to implement Merrill's framework, but because behavioral assessment of People+AI collaboration naturally requires the conditions that effective instruction requires. You cannot measure collaboration behavior without creating a real problem (problem-centered), in the learner's domain (activation), where AI behaviors are observable (demonstration), and the learner's responses are captured (application) and analyzed (integration).

The assessment is instruction because the conditions for valid measurement and the conditions for effective instruction are the same conditions.

Implications for L&D Leaders

If you are evaluating AI training programs for your organization, Merrill's principles provide a practical diagnostic:

Ask five questions about any program you are considering:

Does it engage learners in solving real problems with AI, or does it present information about AI?
Does it leverage the learner's existing domain expertise, or does it treat all learners the same?
Does it demonstrate AI behaviors firsthand, or does it describe them?
Does it require learners to apply skills with feedback, or does it end after content delivery?
Does it support integration through reflection, discussion, and reassessment, or is it a single event?

Programs that satisfy all five will produce behavioral change. Programs that satisfy fewer will produce completion certificates.

If you are building an AI governance program, consider that measurement which is also instruction is more efficient than measurement followed by instruction. The PAICE assessment can serve as the first instructional experience in a blended program: participants take the assessment, receive their dimensional breakdown, and then engage in targeted training that addresses the specific gaps the assessment revealed. The training is no longer generic because the measurement told you what each person needs.

For organizations running AI Capability Baselines, the Merrill lens reframes what a cohort assessment actually is. It is not just a diagnostic. It is the most instructionally effective event in your entire AI readiness program, because it is the only one that satisfies all five principles simultaneously.

Want to understand your own readiness profile? Take the PAICE assessment to discover your strengths and opportunities.

Get Involved:

Take the assessment (free, always)
Explore our Baseline offerings (for organizations)
Read the whitepapers (comprehensive framework)
Contact us about your specific requirements

Merrill's First Principles and the PAICE Assessment