← Coreference Resolution Relation Extraction & Knowledge Graph Construction →

Entity Linking & Disambiguation

> NER found "Paris." Entity linking decides: Paris, France? Paris Hilton? Paris, Texas? Paris (the Trojan prince)? Without linking, your knowledge graph stays ambiguous.

Type: Build

Languages: Python

Prerequisites: Phase 5 · 06 (NER), Phase 5 · 24 (Coreference Resolution)

Time: ~60 minutes

The Problem

A sentence reads: "Jordan beat the press." Your NER tags "Jordan" as PERSON. Good. But *which* Jordan?

Michael Jordan (basketball)?
Michael B. Jordan (actor)?
Michael I. Jordan (Berkeley ML professor — yes, this confusion is real in ML papers)?
Jordan (the country)?
Jordan (Hebrew first name)?

Entity linking (EL) resolves each mention to a unique entry in a knowledge base: Wikidata, Wikipedia, DBpedia, or your domain KB. Two subtasks:

Candidate generation. Given "Jordan," which KB entries are plausible?
Disambiguation. Given the context, which candidate is the right one?

Both steps are learnable. Both are benchmarked. The combined pipeline has been stable for a decade — what changes is the quality of the disambiguator.

The Concept

Entity linking pipeline: mention → candidates → disambiguated entity

Candidate generation. Given the mention surface form ("Jordan"), look up candidates in an alias index. Wikipedia alias dictionaries cover most named entities: "JFK" → John F. Kennedy, Jacqueline Kennedy, JFK airport, JFK (movie). Typical index returns 10-30 candidates per mention.

Disambiguation: three approaches.

Prior + context (Milne & Witten, 2008). P(entity | mention) × context-similarity(entity, text). Works well, fast, no training.
Embedding-based (ESS / REL / Blink). Encode mention + context. Encode each candidate's description. Pick max cosine. The 2020-2024 default.
Generative (GENRE, 2021; LLM-based, 2023+). Decode the entity's canonical name token-by-token. Constrained to a trie of valid entity names so output is guaranteed to be a valid KB id.

End-to-end vs pipeline. Modern models (ELQ, BLINK, ExtEnD, GENRE) run NER + candidate generation + disambiguation in one pass. Pipeline systems still dominate in production because you can swap components.

The two measurements

Mention recall (candidate gen). Fraction of gold mentions where the correct KB entry appears in the candidate list. Floor for the whole pipeline.
Disambiguation accuracy / F1. Given correct candidates, how often the top-1 is right.

Always report both. A system with 99% disambiguation on 80% candidate recall is an 80% pipeline.

Build It

Step 1: build an alias index from Wikipedia redirects

alias_to_entities = {
    "jordan": ["Q41421 (Michael Jordan)", "Q810 (Jordan, country)", "Q254110 (Michael B. Jordan)"],
    "paris":  ["Q90 (Paris, France)", "Q663094 (Paris, Texas)", "Q55411 (Paris Hilton)"],
    "apple":  ["Q312 (Apple Inc.)", "Q89 (apple, fruit)"],
}

Wikipedia alias data: ~18M (alias, entity) pairs. Download from Wikidata dumps. Store as inverted index.

Step 2: context-based disambiguation

def disambiguate(mention, context, alias_index, entity_desc):
    candidates = alias_index.get(mention.lower(), [])
    if not candidates:
        return None, 0.0
    context_words = set(tokenize(context))
    best, best_score = None, -1
    for entity_id in candidates:
        desc_words = set(tokenize(entity_desc[entity_id]))
        union = len(context_words | desc_words)
        score = len(context_words & desc_words) / union if union else 0.0
        if score > best_score:
            best, best_score = entity_id, score
    return best, best_score

The Jaccard overlap is a toy. Replace with cosine similarity on embeddings (see code/main.py step-2 for the transformer version).

Step 3: embedding-based (BLINK-style)

from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

def embed_mention(text, mention_span):
    start, end = mention_span
    marked = f"{text[:start]} [MENTION] {text[start:end]} [/MENTION] {text[end:]}"
    return encoder.encode([marked], normalize_embeddings=True)[0]

def embed_entity(entity_id, description):
    return encoder.encode([f"{entity_id}: {description}"], normalize_embeddings=True)[0]

At index time, embed every KB entity once. At query time, embed the mention + context once, dot-product against the candidate pool, pick max.

Step 4: generative entity linking (concept)

GENRE decodes the entity's Wikipedia title character-by-character. Constrained decoding (see lesson 20) ensures only valid titles can be output. Tight integration with a KB-backed trie. The modern descendant is REL-GEN and LLM-prompted EL with structured output.

prompt = f"""Text: {text}
Mention: {mention}
List the best Wikipedia title for this mention.
Respond with JSON: {{"title": "..."}}"""

Combined with a whitelist (Outlines choice), this is the simplest EL pipeline to ship in 2026.

Step 5: evaluate on AIDA-CoNLL

AIDA-CoNLL is the standard EL benchmark: 1,393 Reuters articles, 34k mentions, Wikipedia entities. Report in-KB accuracy (P@1) and out-of-KB NIL-detection rate.

Pitfalls

NIL handling. Some mentions are not in the KB (emerging entities, obscure people). Systems must predict NIL instead of guessing the wrong entity. Measured separately.
Mention boundary errors. Upstream NER misses partial spans ("Bank of America" tagged as just "Bank"). EL recall drops.
Popularity bias. Trained systems over-predict frequent entities. A mention of "Michael I. Jordan" on an ML paper often links to basketball Jordan.
Cross-lingual EL. Mapping mentions in Chinese text to English Wikipedia entities. Requires a multilingual encoder or a translation step.
KB staleness. New companies, events, people are not in last year's Wikipedia dump. Production pipelines need a refresh loop.

Use It

The 2026 stack:

Situation	Pick
General-purpose English + Wikipedia	BLINK or REL
Cross-lingual, KB = Wikipedia	mGENRE
LLM-friendly, few mentions/day	Prompt Claude/GPT-4 with candidate list + constrained JSON
Domain-specific KB (medical, legal)	Custom BERT with KB-aware retrieval + fine-tune on domain AIDA-style set
Extremely low-latency	Exact-match prior only (Milne-Witten baseline)
Research SOTA	GENRE / ExtEnD / generative LLM-EL

Production pattern that ships in 2026: NER → coref → EL on each mention → collapse clusters to one canonical entity per cluster. Output: one KB id per entity in the document, not one per mention.

Ship It

Save as outputs/skill-entity-linker.md:

name: entity-linker
description: Design an entity linking pipeline — KB, candidate generator, disambiguator, evaluation.
version: 1.0.0
phase: 5
lesson: 25
tags: [nlp, entity-linking, knowledge-graph]
---

Given a use case (domain KB, language, volume, latency budget), output:

1. Knowledge base. Wikidata / Wikipedia / custom KB. Version date. Refresh cadence.
2. Candidate generator. Alias-index, embedding, or hybrid. Target mention recall @ K.
3. Disambiguator. Prior + context, embedding-based, generative, or LLM-prompted.
4. NIL strategy. Threshold on top score, classifier, or explicit NIL candidate.
5. Evaluation. Mention recall @ 30, top-1 accuracy, NIL-detection F1 on held-out set.

Refuse any EL pipeline without a mention-recall baseline (you cannot evaluate a disambiguator without knowing candidate gen surfaced the right entity). Refuse any pipeline using LLM-prompted EL without constrained output to valid KB ids. Flag systems where popularity bias affects minority entities (e.g. name-clashes) without domain fine-tuning.

Exercises

Easy. Implement the prior+context disambiguator in code/main.py on 10 ambiguous mentions (Paris, Jordan, Apple). Hand-label the correct entity. Measure accuracy.
Medium. Encode 50 ambiguous mentions with a sentence transformer. Embed each candidate's description. Compare embedding-based disambiguation to Jaccard context overlap.
Hard. Build a 1k-entity domain KB (e.g. employees + products in your company). Implement NER + EL end-to-end. Measure precision and recall on 100 held-out sentences.

Key Terms

Term	What people say	What it actually means
Entity linking (EL)	Link to Wikipedia	Map a mention to a unique KB entry.
Candidate generation	Who could it be?	Return a shortlist of plausible KB entries for a mention.
Disambiguation	Pick the right one	Score candidates using context, pick the winner.
Alias index	The lookup table	Map from surface form → candidate entities.
NIL	Not in KB	Explicit prediction that no KB entry matches.
KB	Knowledge base	Wikidata, Wikipedia, DBpedia, or your domain KB.
AIDA-CoNLL	The benchmark	1,393 Reuters articles with gold entity links.