Guide

How to Build a QA Scorecard for a Call Center

A QA scorecard is the foundation of every call center quality program. Get it right and you have a consistent, trusted framework for measuring and improving agent performance. Get it wrong and your scores are arbitrary numbers that agents resent and managers cannot act on.

By Call Coach IQ Team·February 2026·9 min read

Step 1: Define the Call Types You Will Score

A single generic scorecard for all call types is a trap. What constitutes a good customer service call looks different from a good collections call, a good retention call, or a good outbound sales call. If you force them into the same scorecard, you will end up with criteria that are irrelevant to some calls and missing criteria that matter on others.

Start by listing your call types. For most contact centers, the list is 2–4 distinct types. Build a separate scorecard for each. Share criteria where they overlap (greeting, close) but weight them independently and add type-specific criteria where needed.

Step 2: Choose Your Scoring Criteria and Weights

For each call type, define the criteria you will score and how many points each is worth. Criteria should sum to 100 — this makes scores immediately interpretable without conversion math.

Criterion	Points	What it measures
Greeting & identification	10	Brand-compliant opening, agent identification
Active listening	20	Paraphrasing, no interruption, issue confirmation
Problem resolution	25	Accurate resolution, correct process followed
Empathy & tone	20	Appropriate emotional response throughout
Compliance	15	Required disclosures delivered, data handled correctly
Professional close	10	Confirms resolution, thanks customer, clean ending

These weights are a starting point for a customer service call. Adjust based on your call type: collections and financial services calls should weight compliance at 25–30%. Retention calls should weight empathy higher. Outbound sales calls need different resolution criteria entirely.

Step 3: Write Behavioral Anchors

Criteria names are not enough. "Empathy & tone: 20 points" tells an evaluator nothing about what earns 20 points versus 10 points versus zero. You need behavioral anchors — specific descriptions of what each performance level looks like.

Full credit (20/20)

Agent explicitly acknowledges customer frustration at least once. Tone remains warm throughout. Apology is proportionate and sincere. No robotic or scripted delivery.

Partial credit (10/20)

Agent maintains professional tone but does not explicitly acknowledge customer's emotional state. No empathy statement delivered. Call is functional but emotionally cold.

No credit (0/20)

Agent sounds impatient, dismissive, or talks over the customer. Multiple instances of flat or robotic tone. Customer frustration escalates over the course of the call.

Write anchors for every criterion. This step takes time but it is the difference between a scorecard QA managers can apply consistently and one that produces 15-point scoring variance between reviewers.

Step 4: Set Auto-Fail Criteria

Some behaviors should result in a failed call regardless of overall score. A 92-point call where the agent shared a customer's account information with the wrong person is not a 92-point call — it is a compliance event.

Common auto-fail criteria:

Agent confirms account details with a caller who fails verification
Required regulatory disclosure (Mini-Miranda, opt-out) not delivered
Agent makes a promise the company cannot honor
Agent records incorrect information in the system
Call is disconnected without resolution attempt or customer consent

Auto-fail items should be defined in your rubric before you start scoring. If you are using AI scoring, configure them as hard rules that override the composite score.

Step 5: Calibrate Before You Launch

Before the scorecard goes live, run at least one calibration session. Select 5 calls that represent the range of call quality on your team. Have each evaluator score them independently, then compare results.

If two evaluators disagree by more than 5 points on a criterion, the rubric language for that criterion is too vague. Work through the disagreement, identify the ambiguity, and rewrite the behavioral anchor before launching.

After launch, run calibration monthly. Criteria that consistently produce disagreement need refinement. Criteria where all evaluators consistently agree can have their anchors simplified.

Step 6: Communicate the Scorecard to Agents

Agents should see the scorecard before they are scored by it. Share the criteria, weights, and behavioral anchors in a team meeting. Walk through examples of calls that scored well and calls that scored poorly, and explain the reasoning for each criterion.

Agents who understand the scorecard before their first scored call are far more likely to trust the process. Agents who see a score sheet for the first time after a low score tend to reject it. The order of operations matters.

Automating the Scorecard with AI

Once your scorecard is defined, calibrated, and trusted — the natural next step is automation. AI call scoring applies your rubric to 100% of calls, using the same behavioral criteria you defined, with the consistency that human reviewers cannot maintain across hundreds of calls per week.

The scorecard does not change when you add AI — it becomes more thoroughly applied. Your QA managers shift from spending their time listening to calls and filling in rubric spreadsheets, to reviewing AI-generated results, handling disputes, and focusing on the coaching conversations that require human judgment.

Common Questions

How should a call center QA scorecard be structured?

An effective scorecard is organized into weighted categories, each containing specific scored criteria. Categories typically include: greeting and compliance (highest weight in regulated environments), problem identification and empathy, resolution accuracy, and call closure. Each criterion should be answerable with a binary (yes/no) or scaled (1–5) response — not a subjective judgment. Category weights should sum to 100%, with auto-fail items listed separately since they override the total score regardless of other performance.

What is the right number of criteria on a call center QA scorecard?

Most production QA scorecards contain 15 to 30 criteria. Fewer than 15 criteria often lack the granularity to generate useful coaching insights — a low score doesn't reveal which specific behavior to address. More than 35 criteria create reviewer fatigue and scoring inconsistency. The right number is the minimum set of criteria that, when scored, tells a coaching supervisor exactly where to focus. Start with your most impactful criteria and add only when you identify a behavior gap that existing criteria don't capture.

How do you define and apply auto-fail criteria on a QA scorecard?

Auto-fail criteria are behaviors so serious that no positive performance elsewhere on the call can offset them. They typically include: required disclosure omissions with regulatory consequences, prohibited representations, identity verification failures, and security protocol breaches. Auto-fail criteria should be defined with your legal and compliance teams, not just QA management. When a call triggers an auto-fail, the scoring system should record a zero regardless of other scores — and the call should route to a compliance review path, not just a standard coaching queue.

How do you calibrate a QA scorecard across multiple reviewers?

Calibration is the process of having multiple reviewers independently score the same set of calls, then comparing results to identify scoring gaps. Monthly calibration sessions using a 10-call calibration set should produce 85%+ agreement on objective criteria and 75%+ on subjective criteria. When agreement falls below those thresholds, the criterion definition needs clarification — usually by adding specific examples of what "meets" and "does not meet" the criterion looks like on a call. Calibration data should be archived to track consistency over time.

See Your Scorecard Scored Automatically on Every Call

Call Coach IQ lets you configure your rubric — criteria, weights, auto-fail items, and behavioral anchors — and applies it to every call automatically. Book a demo and see it run on your call type.

Request a Demo