Interview Guide Design

The Foundational Constraint

Why Guide Design Determines Segmentation Outcomes

The interview guide is not just a question list. It is the data collection plan that determines which segmentation variables will exist in the final dataset.

No amount of analysis can recover data the guide did not collect. If the guide does not ask about evaluation triggers, you cannot build an Evaluation Trigger dimension. If it does not ask about buying committee composition, you cannot build a Decision Complexity dimension.

The binding constraint: Every clustering variable must come from something you explicitly asked about. Finalize the interview guide only after mapping every question to its segmentation role.

What questions produce

Categorical/ordinal responses become firmographic defining variables
Thematic responses become composite binary defining variables
Screener fields become the outcome variable ground truth
Current-state questions become profiling variables

What cannot be fixed after

Missing Module A questions cannot be reconstructed from Module B data
Outcome variable not in screener cannot be de-contaminated
Low-variance variables cannot become discriminating after collection
Three-audience gaps cannot be papered over in analysis

What guide annotations do

Force explicit segmentation role assignment before fieldwork
Provide the Dimension Architect with question context
Become the study_config.json that feeds Phase 6 of the pipeline
Create an auditable record of design decisions

Study Design

The Three-Audience Test

Before writing any question, identify which downstream audiences will use the segmentation findings. Each audience needs a different type of variable.

Marketing

Needs pain points in the participant's own words, evaluation criteria, and what messaging resonates or triggers rejection.

Questions to include:

What language did they use to describe the problem?
What triggered the evaluation?
What made a tool feel like a match or a mismatch?
What were they trying to accomplish?

Sales

Needs observable signals to classify a prospect before a full interview, and differentiated scripts for each segment type.

Questions to include:

What is observable about their situation from LinkedIn or a website?
What triggered the evaluation and who was involved?
Current tool situation (replacing vs. augmenting)
Buying committee composition

Product

Needs feature requirements, table-stakes disqualifiers, and what would accelerate or block the purchase decision.

Questions to include:

What must the tool do? (table-stakes requirements)
What blocked or disqualified alternatives?
What current workarounds exist?
What would accelerate the purchase decision?

If the guide only covers one audience, the segmentation can only make recommendations to one audience. This is the most common avoidable failure in client studies and is detectable from the interview guide before a single interview is conducted.

The Most Important Structural Decision

The Two Temporal Layers

All interview questions fall into one of two temporal layers. This determines whether a variable can be a clustering input or must be treated as profiling.

Module A: Historical Evaluation

Upstream of tool choice

Questions about the past evaluation process — what drove the decision, what criteria were applied, who was involved, why alternatives were rejected.

Defining variable candidates

What triggered the evaluation?
What criteria mattered most?
Why was the previous tool rejected?
Who was involved in the decision?
What problems drove the switch evaluation?

Tool Choice

Module B: Current State

Downstream of tool choice

Questions about current satisfaction, gaps, and needs — describing what the participant experiences with the tool they chose.

Profiling variables by default

How satisfied are you with your current tool?
What features do you wish it had?
What frustrates you about it?
What does your current workflow look like?

The temporal contamination risk: A participant who chose Ease-of-Use Tool A may now report ease of use as highly important — because the tool shaped their expectations, not because ease of use predicted the choice. Using Module B data as defining variables introduces reverse causality: the outcome (tool choice) partly causes the variables you are using to predict it. The resulting clusters group people by downstream effects rather than upstream drivers.

The boundary question: "What criteria did you use when evaluating tools?" is Module A (asking about a past decision). "What do you look for in a tool today?" is Module B (describing current preferences, shaped by the current tool). These questions look similar but have opposite segmentation roles. Always anchor the question in time explicitly — "Back then..." for Module A, "Currently..." for Module B.

How to Annotate

The Annotation System

Annotate every question before finalizing the guide. Annotations force explicit segmentation role assignment and feed directly into the coding pipeline.

Annotation Format

Place the annotation immediately after the question number, before the question text:

**Q7**

`thematic | defining | Module A`

Back then, what motivated your team to consider other recruiting tools?

> Coding note: This is one of the highest-value questions in the guide. Code all themes present. Candidate themes include: tool capability gap / team growth event / poor prior tool reliability / compliance requirement / new HR leadership / manual process pain / integration failure / contract renewal as trigger. These themes will form the Evaluation Trigger composite dimension.

Variable Types

Type	When to use	How it encodes	Example
`thematic`	Open-ended qualitative question	Coded to named themes; becomes composite binary dimension	"What motivated you to evaluate new tools?"
`discrete category`	Fixed set of non-ordered values	Coded to a category label	"What industry is your company in?"
`ordinal category`	Ordered values (size, tenure, recency)	Coded to an ordered label; encoded as integer	"How many people work at your company?"
`binary`	Yes/no or forced choice	Coded as 1/0	"Did you evaluate more than one tool?"
`rank order`	Participant ranks a list of options	Binary flags per rank position, or top-N presence	"Rank these 5 features by importance"

Segmentation Roles

Role	What it means	Where it goes
defining	Goes into cluster analysis. Determines which cluster each participant falls into.	segmentation-ready.csv (defining columns)
outcome	Validates that clusters predict something useful. Never enters clustering. Must come from screener.	segmentation-profile.csv (outcome column)
profiling	Describes what each segment looks like after clustering. No causal role in segment formation.	segmentation-profile.csv (profiling columns)

Most of your variables are profiling. Only a small, curated set — typically 6-12 for a 70-100 participant study — should be defining. The N/10 rule sets the hard cap: maximum defining dimensions = sample_size / 10. At N=80 that is 8 defining dimensions.

Data Collection Design

Screener vs. Interview

The screener and interview serve different roles. Mixing them up contaminates the outcome variable and weakens the segmentation.

Capture in the Screener

Structured fields captured before the interview. Consistent format, not subject to conversational bias or interviewer effect.

Company size (must be verified, not estimated in conversation)
Seniority level
Industry
Current tool (the outcome variable ground truth)
Budget authority level

Capture in the Interview

Qualitative depth the screener cannot capture: narrative, emotion, language, criteria, committee dynamics.

Evaluation triggers and their context
Decision criteria and their weighting
Rejection reasons for alternatives
Buying committee composition
Feature requirements and disqualifiers

The outcome variable must come from the screener. If you infer current tool adoption from what someone says during the interview, the outcome variable is contaminated. The interview topic primes them. Their description of their current tool is influenced by the conversation context. You cannot verify the answer independently afterward. The screener field is the ground truth — the interview version is a recall confirmation only.

Pipeline Connection

The study_config.json Connection

Every annotation you write in the interview guide becomes machine-readable context in study_config.json, which feeds Phase 6 of the coding pipeline.

Create study_config.json in the Interview Project folder alongside transcripts.json. The Dimension Architect reads it during Phase 6 to classify themes into segmentation dimensions.

{
  "study_name": "HR Leaders",
  "sample_size": 80,          // sets N/10 ceiling: max 8 defining dimensions
  "outcome_variable": {
    "question_id": "Q4",
    "field_name": "current_tool",
    "description": "Primary recruiting tool — verified from screener"
  },
  "question_context": [
    {
      "question_id": "Q1",
      "temporal_layer": "firmographic",
      "purpose_hint": "defining",
      "note": "seniority and function"
    },
    {
      "question_id": "Q7",
      "temporal_layer": "module_a",   // upstream of tool choice
      "purpose_hint": "defining",
      "note": "evaluation trigger — primary defining variable"
    },
    {
      "question_id": "Q16",
      "temporal_layer": "module_b",   // downstream of tool choice
      "purpose_hint": "profiling",
      "note": "current tool strengths — profiling only"
    }
  ]
}

If study_config.json is absent, Phase 6 runs with no question context and infers temporal layers from question text alone. The Dimension Architect will do its best, but it cannot know which question holds the outcome variable without being told. Always create this file before running the pipeline.

N/10 Ceiling

The sample_size field drives the maximum number of defining dimensions. The N/10 rule (at least 10 participants per variable for stable clusters) sets the ceiling.

Study size	Max defining dimensions
50 participants	5
70 participants	7
80 participants	8
100 participants	10
200 participants	20

Applied Example

Annotation Walkthrough: HR Leaders Guide

Six questions from the HR Leaders study showing annotation decisions and the reasoning behind each classification.

Q4 — The Outcome Variable

**Q4**

`discrete category | outcome`

Currently, when you or your team needs to recruit new talent, what is the main software or tool that you use for that?

Q4 is annotated outcome because it is the ground truth for cluster validation. The answer is already known from the screener. The interview version is a recall check — if screener and interview responses conflict, flag the participant. Never use this field as a clustering input.

Q7 — The Primary Defining Variable

**Q7**

`thematic | defining | Module A`

Back then, what motivated your team to consider other recruiting tools?

Q7 is Module A (asking about a historical decision) and thematic. It will likely form the Evaluation Trigger composite dimension — a primary clustering input. All themes should be coded, not just the first one mentioned. This is one of the highest-value questions in the guide.

Q9 — A Module A Question That Looks Like Module B

**Q9**

`thematic | defining | Module A`

What specific problems or pain points drove you to look for a new tool?

This question could be mistaken for Module B because it asks about "problems." But the explicit temporal anchor ("drove you to look for a new tool") places it in the evaluation period. The problems described here caused the switch evaluation — upstream of the choice. defining candidate. If rephrased as "What problems do you have with your current tool?" it would become Module B and must be annotated profiling.

Q12 — Buying Committee Composition

**Q12**

`discrete category | defining | Module A`

Who else, if anyone, was involved in the decision to select a new tool?

Buying committee complexity is Module A (the decision has been made) and produces a categorical code (sole decider / small group / formal committee). It reliably predicts evaluation process length and sales motion complexity. defining.

Q6 — Reclassified to Profiling

**Q6**

`discrete category | profiling`

Back then, what recruiting tool were you using when you started evaluating other tools?

Q6 asks about the prior tool — Module A in temporal terms. But the specific prior tool name is competitive intelligence, not a structural segmentation variable. Segments defined partly by "was using Tool X before" cannot be activated — sales and marketing cannot target people based on their prior tool without a full research interview. Annotated profiling for competitive win/loss analysis post-clustering.

Q16 — Current Tool Strengths (Module B)

**Q16**

`thematic | profiling | Module B`

What do you see as the main strengths of your current recruiting tool?

Q16 is squarely Module B — the participant is describing their current experience with the tool they chose. These themes are downstream of the tool choice. They describe what the tool is good at, not what drove the selection. profiling only.

Annotation Quality

Common Annotation Errors

Five errors that produce bad segmentation variables — all detectable before fieldwork begins.

Error	Example	Correct approach	Why it matters
Module B as defining	"What frustrates you about your current tool?" annotated `defining`	Annotate as `profiling`; move the corresponding Module A question to `defining`	Current frustrations are shaped by the tool chosen, not the other way around. Produces reverse-causal clusters.
Outcome as defining	Q4 (current tool) annotated `defining`	Annotate as `outcome`; ensure screener captures it	Putting the outcome into clustering builds the answer into the question. Clusters become self-fulfilling.
Module A / B conflation	"What were your evaluation criteria?" (Module A) and "What do you look for in a tool?" (Module B) both annotated `defining`	Module A version is defining candidate; Module B version is profiling	These questions look similar but have opposite segmentation roles due to temporal anchoring.
Near-constant variable as defining	All participants are mid-market HR leaders; company size annotated `defining`	Check screener distribution first; if near-constant, annotate `profiling`	Near-constant variables waste a defining variable slot without adding discriminative power.
Rank-order with no encoding plan	"Rank 6 features" annotated `defining` with no note	Specify encoding in the coding note: binary flag for rank-1 position, or presence in top 3	The Dimension Architect needs clear encoding guidance for rank-order data. Without it, it will guess.

Reference Card

Quick Reference

The annotation key, decision rules, and pre-finalization checklist on one page.

Variable Types

thematic Open-ended qualitative; codes to themes

discrete category Fixed non-ordered values

ordinal category Ordered values; encodes as integer

binary Yes/no or forced choice

rank order Ranked list; specify encoding in note

Segmentation Roles

defining Goes into cluster analysis

outcome Validates clusters; never clusters

profiling Describes segments post-clustering

Temporal Layers

Module A Historical evaluation — upstream of tool choice — defining candidates

Module B Current state — downstream of tool choice — profiling by default

firmographic Organizational facts — no temporal layer needed

N/10 Ceiling

50 participants max 5 defining dimensions

70 participants max 7 defining dimensions

80 participants max 8 defining dimensions

100 participants max 10 defining dimensions

200 participants max 20 defining dimensions

Before Finalizing the Guide

Three-audience coverage check At least 3 strong questions per audience (marketing, sales, product). Identify any gap and add questions before finalizing.
Temporal layer check Every thematic question is explicitly anchored in time (Module A or Module B). If a question is ambiguous, add a temporal anchor to the question text.
Outcome variable check The outcome variable is in the screener. The corresponding interview question has a coding note saying to flag discrepancies between screener and interview.
Defining variable count Count the questions annotated defining. This is the upper bound on how many composite dimensions are possible. Fewer than 6 defining questions for a 70-participant study warrants reconsideration.
study_config.json pre-fill Before running any interviews, draft study_config.json from the annotations. If you cannot fill in purpose_hint and temporal_layer for each question, the annotation is incomplete.