Free Interactive Tool · Human-AI Collaboration

Human-AI Interaction & Decision Quality

How effectively are humans and AI actually working together in your context? Benchmark your decision acceptance rates, automation bias exposure, and collaboration quality against global research from McKinsey, BCG, Stanford HAI, and MIT — adjusted for your industry, role, and decision type.

Industry

Role Level

Decision Type

Decision Acceptance Rate — —

Automation Bias Index — accepted without review

AI Trust Score — composite /100 · Edelman baseline 53

Augmentation Preference — prefer AI as collaborator vs autonomous

Decision Quality Lift — vs solo human baseline · McKinsey

Time-to-Decision — faster with AI assistance

Decision Acceptance Funnel

Journey from AI recommendation → meaningful review → accepted → acted on.

AI recommendations generated

Meaningfully reviewed

Accepted & acted on

Automation bias (auto-accepted)

Automation vs Augmentation Split

Optimal balance for this decision type. Centaur model (human+AI) outperforms either alone by —.

Full Automation

Human-AI Augmentation

Decision Quality Dimensions

Five-dimension quality profile vs industry benchmark. Scores reflect improvement vs solo-human baseline.

Your profile

Industry benchmark

Acceptance Rate by Industry (BCG / Stanford HAI 2023 · optimal zone 65–85%)

Research & Context

What each metric measures, what the research says, and how to improve your collaboration posture.

Decision Acceptance Rates & Automation Bias

What It Measures

Decision Acceptance Rate tracks the proportion of AI recommendations that humans act on. Automation Bias Index measures how many of those acceptances occurred without meaningful human review — the silent governance failure that most organisations have not yet instrumented. The EEOC and EU AI Act both require documented human oversight for high-risk decisions; automation bias is evidence that oversight is nominal rather than real.

Global Benchmarks

Healthcare: 79% acceptance, 31% automation bias — radiologists accepting AI diagnostic flags without independent verification (MIT CSAIL 2023)
Financial Services: 68% acceptance, 42% automation bias — highest bias rate across sectors; credit officers approving AI-scored applications at volume without case review (BCG 2023)
Legal: 61% acceptance, 23% bias — most conservative sector; liability exposure drives genuine review (Stanford HAI)
Optimal zone: 65–85% acceptance with <25% automation bias. Below 50% = undertrust; above 85% = over-reliance
DARPA XAI finding: providing explanations with AI recommendations reduces automation bias by 18% — the single most effective intervention

How to Improve

Instrument your AI systems to log whether humans accessed the explanation before accepting — this is your automation bias rate
Add mandatory explanation display before acceptance for P1/P0 decisions — interface friction that requires acknowledgement, not just click-through
Set a review SLA: for high-stakes decisions, require logged time-on-task before acceptance (>60 seconds minimum)
Report acceptance rates by team to leadership monthly — the act of measurement alone reduces automation bias by 12% (MIT Sloan 2022)

Automation vs Augmentation Spectrum

What It Measures

The automation vs augmentation split defines how AI is deployed across a decision portfolio. Automation means AI decides and acts without human involvement. Augmentation (the "centaur model") means AI advises, humans decide. The optimal split is not fixed — it varies critically by decision type, reversibility, regulatory context, and cognitive stakes. Getting this wrong in either direction destroys value: over-automation creates liability and error propagation; under-automation wastes the tool.

Global Benchmarks

Centaur model outperformance: Human+AI teams beat solo AI by 23% and solo humans by 31% on complex decisions — BCG 2023 study of 12,000 knowledge workers
Routine decisions: 65% automation / 35% augmentation optimal — McKinsey Global Institute 2023
Complex decisions: 25% automation / 75% augmentation — Stanford HAI recommendation
High-stakes decisions: 8% automation / 92% augmentation — any fully automated high-stakes decision is a governance violation under NIST AI RMF
Augmentation preference: 71% of knowledge workers prefer AI as thought-partner (McKinsey 2023); this rises to 84% for healthcare workers and 88% for legal professionals

How to Calibrate

Map every AI deployment to one of three tiers: Automate (routine, reversible, low-stakes), Augment (complex, consequential, regulated), Advise-only (irreversible, high-liability, safety-critical)
Calculate your Return on Employee (RoE): measure hours freed from automated tasks + decision quality improvement per person — this is the centaur dividend
Resist the automation bias in system design — the default should be augmentation, with automation requiring explicit justification and governance sign-off
Survey team augmentation preference quarterly — low preference scores predict adoption failure before it happens

Decision Quality & Cognitive Load

What It Measures

Decision quality in human-AI systems is multi-dimensional: accuracy (correctness), speed (time-to-decision), consistency (same decision in the same context), error rate (critical failures), and cognitive load (mental effort required). The HAIS (Human-AI Integration Scale), developed by researchers at MIT and Northeastern, provides a validated 25-item instrument for measuring how well AI integration serves human cognition rather than taxing it.

Global Benchmarks

Accuracy lift: +18% average improvement in AI-assisted vs solo human decisions; up to +22% in healthcare (BCG/MIT 2023)
Error reduction: −37% critical errors in healthcare AI with human review; −22% in financial services (Stanford HAI 2023)
Time-to-decision: −28% faster on average; routine decisions faster by 45%; high-stakes decisions faster by only 12% (appropriate caution)
Decision consistency: +31% improvement — AI dramatically reduces "decision fatigue" variance; humans make worse decisions in the afternoon, AI does not
Cognitive load: −24% reduction in perceived mental effort when AI provides structured options vs open-ended assistance (HAIS scale validation studies)
Trust Score (Edelman 2024): Global average 53/100; healthcare 62; financial services 49; legal 44

How to Measure

Deploy the HAIS scale as a quarterly 25-item survey — it takes 8 minutes and produces a validated composite score across trust, transparency, control, and explainability dimensions
Establish baseline accuracy and error rates before AI deployment — you cannot measure lift without a pre-AI baseline captured in the same period
Track decision consistency using the same case presented to the same person twice over 4 weeks — the variance is your "human inconsistency baseline" that AI should reduce
Monitor cognitive load as a leading indicator of adoption failure — high perceived effort predicts abandonment within 90 days, well before accuracy drops become visible

Free Download

Get the Human-AI Collaboration Assessment Template

The 38-point assessment template used to evaluate human-AI interaction quality across your organisation — covering decision acceptance protocols, automation bias audit, augmentation framework design, and the HAIS survey instrument for measuring cognitive integration.

Decision acceptance rate tracker with automation bias audit (12 items)
Automation vs augmentation decision matrix for your use-case portfolio
Abbreviated HAIS instrument (25-item validated survey, 8 minutes)
Return on Employee (RoE) measurement framework for AI-assisted roles

No spam. Unsubscribe any time.