Paired Comparison Estimation Method

1. Concept and Aim

The Paired Comparison Estimation Method (PCEM) helps derive consistent size estimates of software modules or user stories by comparing them two at a time.

Instead of asking “How big is module X?”, the method asks,

“Is module X bigger or smaller than module Y — and by how much?”

From these relative judgments, the method produces:

📏 Estimated absolute magnitudes (e.g., story points, SLOC, effort)
⚖️ A measure of internal consistency among judgments

✅ The method guarantees consistency, not correctness.
🔍 Inconsistent comparisons must be reviewed until the Inconsistency Index is acceptable.

2. Core Idea

People are more accurate at comparing relative magnitudes than assigning absolute numbers (Hihn & Lum, 2004).
Each comparison expresses a ratio (e.g., “Story A is twice Story B”).
These ratios fill a Judgment Matrix, from which the relative scale is derived mathematically.

3. Basic Process

flowchart LR
  %% Supporting inputs
  A[Artifacts to be sized]
  R[Reference artifacts]
  S[Replication factor, verbal scale]
  
  %% Main flow (in parentheses)
  subgraph main_flow [Main flow]
    M1((Rank artifacts from largest to smallest))
    M2((Compare artifacts pairwise, establishing their relative size))
    M3((Compare absolute sizes and compute inconsistency index))
    M4((Review inconsistent judgments))
  end
  
  %% Output
  G[Sized Artifacts]
  
  %% Judgment matrix intermediate
  D[Judgment matrix]
  
  %% Connect supporting inputs into main flow
  A --> M1
  R --> M1
  S --> M2
  
  %% Connect main flow
  M1 --> M2
  M2 --> D
  D --> M3
  M2 --> M3
  M3 --> M4
  M4 --> G

4. Step-by-Step Explanation

Step	Description	Example
1. Rank artifacts	List the items (modules or user stories) roughly from largest to smallest.	Story D > Story B > Story C > Story A
2. Pairwise comparison	Compare each pair and record how much larger/smaller one is (e.g., 2×, 0.5×). Fill the upper triangle of a Judgment Matrix.	A vs. B = 3 means A is 3× larger than B.
3. Compute consistency	Use the geometric mean to compute each item’s relative magnitude and derive an Inconsistency Index.	Inconsistency ≤ 0.35 is acceptable (Miranda, 2009).
4. Review inconsistencies	Re-examine comparisons that violate logical consistency (if A>B and B>C, then A>C).	If not, revise A–C comparison.
5. Normalize results	Pick a reference artifact with a known size to convert relative magnitudes into actual units.	If reference = 2000 SLOC, multiply ratios to get estimates.

5. Probabilistic Extension (NASA–JPL)

Hihn & Lum (2004) expanded the method by:

Treating each pairwise judgment as a distribution (triangular: min, mode, max)
Running Monte Carlo simulations to propagate uncertainty
Allowing multiple reference modules (each with its own known size)

Result: a probability distribution for each size estimate — not just a single number.

This makes the method especially useful in early project phases with uncertainty.

6. Example (Simplified)

Suppose we have 4 modules (A, B, C, D).
An estimator judges:

Compared Modules	Relative Size
A / B	2
A / C	3
A / D	4
B / C	1.5
B / D	2
C / D	1.5

From these ratios, a Judgment Matrix is formed.
The geometric mean of each row gives relative magnitudes:

Module	Geometric Mean	Relative Scale
A	2.21	2.21
B	1.11	1.11
C	0.76	0.76
D	0.54	0.54

If C = 2000 SLOC (reference), then:

A = 5826 SLOC
B = 2913 SLOC
D = 1414 SLOC

7. Managing Inconsistency

The Inconsistency Index (II) measures how logically consistent the comparisons are.
A perfectly consistent matrix satisfies:

\[a_{ij} \times a_{jk} = a_{ik}\]

When II ≤ 0.35 → results are considered reliable.
Higher values mean the estimator should review their judgments.

8. Reduced Comparison Designs (Miranda et al., 2009)

In large projects, comparing every pair (Full Factorial) becomes overwhelming.
Miranda’s team introduced Incomplete Cyclic Designs (ICDs) to:

Limit the number of required comparisons
Still maintain balanced, connected comparisons
Maintain acceptable accuracy (correlation ≈ 0.9 with full set)

Example: For 10 user stories, instead of 45 comparisons, only ~20 may be required (r = 4).

9. Strengths

Relies on explicit, rational comparisons
Provides a mathematical consistency check
Empirically validated (Miranda, Hihn, Bozóki)
Encourages estimator commitment and reflection

10. Weaknesses

No built-in mechanism to aggregate multiple estimators’ opinions
May suffer from anchoring bias (first comparisons influence later ones)
Full designs can be time-consuming or tedious

11. Practical Takeaway

PCEM bridges expert judgment and quantitative reasoning.
It’s particularly suited for:

Early estimation when little data exist
Agile user-story sizing
Research on consistency and cognitive estimation

In essence: compare wisely, check consistency, iterate.

12. Sources

Miranda, E., Bourque, P., & Abran, A. (2009). Sizing User Stories Using Paired Comparisons. Information and Software Technology.
Hihn, J., & Lum, K. (2004). Improving Software Size Estimates by Using Probabilistic Pairwise Comparison Matrices. IEEE Int. Symposium on Software Metrics.
Miranda, E. (2001). Improving Subjective Estimates Using Paired Comparisons. IEEE Software, 18(1), 87–91.

Disclaimer: AI is used for text summarization, explaining and formatting. Authors have verified all facts and claims. In case of an error, feel free to file an issue or fix with a pull request.