← Entity Linking & Disambiguation LLM Evaluation — RAGAS, DeepEval, G-Eval →

Relation Extraction & Knowledge Graph Construction

> NER found the entities. Entity linking anchored them. Relation extraction finds the edges between them. A knowledge graph is the sum of nodes, edges, and their provenance.

Type: Build

Languages: Python

Prerequisites: Phase 5 · 06 (NER), Phase 5 · 25 (Entity Linking)

Time: ~60 minutes

The Problem

An analyst reads: "Tim Cook became CEO of Apple in 2011." Four facts:

(Tim Cook, role, CEO)
(Tim Cook, employer, Apple)
(Tim Cook, start_date, 2011)
(Apple, type, Organization)

Relation Extraction (RE) turns free text into structured triples (subject, relation, object). Aggregate across a corpus and you have a knowledge graph. Aggregate and query and you have a reasoning substrate for RAG, analytics, or compliance audits.

The 2026 problem: LLMs extract relations enthusiastically. Too enthusiastically. They hallucinate triples that the source text does not support. Without provenance, you cannot tell real triples from plausible fiction. The 2026 answer is AEVS-style anchor-and-verify pipelines.

The Concept

Text → triples → knowledge graph

Triple form. (subject_entity, relation_type, object_entity). Relations come from a closed ontology (Wikidata properties, FIBO, UMLS) or an open set (OpenIE-style, anything goes).

Three extraction approaches.

Rule / pattern-based. Hearst patterns: "X such as Y" → (Y, isA, X). Plus hand-crafted regex. Brittle, precise, explainable.
Supervised classifier. Given two entity mentions in a sentence, predict the relation from a fixed set. Trained on TACRED, ACE, KBP. Standard 2015–2022.
Generative LLM. Prompt the model to emit triples. Works out of the box. Needs provenance, or hallucinates plausible-looking junk.

AEVS (Anchor-Extraction-Verification-Supplement, 2026). The current hallucination-mitigation framework:

Anchor. Identify every entity span and relation-phrase span with exact positions.
Extract. Generate triples linked to anchor spans.
Verify. Match each triple element back to the source text; reject anything unsupported.
Supplement. A coverage pass ensures no anchored span is dropped.

Hallucinations drop sharply. Requires more compute but is auditable.

The open-vs-closed tradeoff.

Closed ontology. Fixed property list (e.g., Wikidata's 11,000+ properties). Predictable. Queryable. Hard to invent.
Open IE. Any verbal phrase becomes a relation. High recall. Low precision. Messy to query.

Production KGs usually mix: open IE for discovery, then canonicalize relations onto a closed ontology before merging into the main graph.

Build It

Step 1: pattern-based extraction

PATTERNS = [
    (r"(?P<s>[A-Z]\w+) (?:is|was) (?:a|an|the) (?P<o>[A-Z]?\w+)", "isA"),
    (r"(?P<s>[A-Z]\w+) (?:is|was) born in (?P<o>\w+)", "bornIn"),
    (r"(?P<s>[A-Z]\w+) works? (?:at|for) (?P<o>[A-Z]\w+)", "worksAt"),
    (r"(?P<s>[A-Z]\w+) founded (?P<o>[A-Z]\w+)", "founded"),
]

See code/main.py for the full toy extractor. Hearst patterns still ship in domain-specific pipelines because they are debuggable.

Step 2: supervised relation classification

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tok = AutoTokenizer.from_pretrained("Babelscape/rebel-large")
model = AutoModelForSequenceClassification.from_pretrained("Babelscape/rebel-large")

text = "Tim Cook was born in Alabama. He later became CEO of Apple."
encoded = tok(text, return_tensors="pt", truncation=True)
output = model.generate(**encoded, max_length=200)
triples = tok.batch_decode(output, skip_special_tokens=False)

REBEL is a seq2seq relation extractor: text in, triples out, already in Wikidata property ids. Fine-tuned on distant-supervision data. Standard open-weights baseline.

Step 3: LLM-prompted extraction with anchoring

prompt = f"""Extract (subject, relation, object) triples from the text.
For each triple, include the exact character span in the source text.

Text: {text}

Output JSON:
[{{"subject": {{"text": "...", "span": [start, end]}},
   "relation": "...",
   "object": {{"text": "...", "span": [start, end]}}}}, ...]

Only include triples fully supported by the text. No inference beyond what is stated.
"""

Verify every returned span against the source. Reject anything where text[start:end] != triple_entity. This is the AEVS "verify" step in its minimal form.

Step 4: canonicalize onto a closed ontology

RELATION_MAP = {
    "is the CEO of": "P169",       # "chief executive officer"
    "was born in":   "P19",         # "place of birth"
    "founded":        "P112",       # "founded by" (inverted subject/object)
    "works at":       "P108",       # "employer"
}


def canonicalize(relation):
    rel_low = relation.lower().strip()
    if rel_low in RELATION_MAP:
        return RELATION_MAP[rel_low]
    return None   # drop unmapped open relations or route to manual review

Canonicalization is often 60-80% of the engineering work. Budget for it.

Step 5: build a small graph and query

triples = extract(text)
graph = {}
for s, r, o in triples:
    graph.setdefault(s, []).append((r, o))


def neighbors(node, relation=None):
    return [(r, o) for r, o in graph.get(node, []) if relation is None or r == relation]


print(neighbors("Tim Cook", relation="P108"))    # -> [(P108, Apple)]

This is the atom of every RAG-over-KG system. Scale it with RDF triple stores (Blazegraph, Virtuoso), property graphs (Neo4j), or vector-augmented graph stores.

Pitfalls

Coreference before RE. "He founded Apple" — RE needs to know who "he" is. Run coref first (lesson 24).
Entity canonicalization. "Apple Inc" and "Apple" must resolve to the same node. Entity linking first (lesson 25).
Hallucinated triples. LLMs emit triples the text does not support. Enforce span verification.
Relation canonicalization drift. Open IE relations are inconsistent ("was born in," "came from," "is a native of"). Collapse to canonical ids or the graph is unqueryable.
Temporal errors. "Tim Cook is CEO of Apple" — true now, false in 2005. Many relations are time-bounded. Use qualifiers (P580 start time, P582 end time in Wikidata).
Domain mismatch. REBEL trained on Wikipedia. Legal, medical, and scientific text often need domain-fine-tuned RE models.

Use It

The 2026 stack:

Situation	Pick
Fast production, general domain	REBEL or LlamaPred with Wikidata canonicalization
Domain-specific (biomed, legal)	SciREX-style domain fine-tune + custom ontology
LLM-prompted, audited output	AEVS pipeline: anchor → extract → verify → supplement
High-volume news IE	Pattern-based + supervised hybrid
Building a KG from scratch	Open IE + manual canonicalization pass
Temporal KG	Extract with qualifiers (start/end time, point in time)

The integration pattern: NER → coref → entity linking → relation extraction → ontology mapping → graph load. Every stage is a potential quality gate.

Ship It

Save as outputs/skill-re-designer.md:

name: re-designer
description: Design a relation extraction pipeline with provenance and canonicalization.
version: 1.0.0
phase: 5
lesson: 26
tags: [nlp, relation-extraction, knowledge-graph]
---

Given a corpus (domain, language, volume) and downstream use (KG-RAG, analytics, compliance), output:

1. Extractor. Pattern-based / supervised / LLM / AEVS hybrid. Reason tied to precision vs recall target.
2. Ontology. Closed property list (Wikidata / domain) or open IE with canonicalization pass.
3. Provenance. Every triple carries source char-span + doc id. Non-negotiable for audit.
4. Merge strategy. Canonical entity id + relation id + temporal qualifiers; dedup policy.
5. Evaluation. Precision / recall on 200 hand-labelled triples + hallucination-rate on LLM-extracted sample.

Refuse any LLM-based RE pipeline without span verification (source provenance). Refuse open-IE output flowing into a production graph without canonicalization. Flag pipelines with no temporal qualifier on time-bounded relations (employer, spouse, position).

Exercises

Easy. Run the pattern extractor in code/main.py on 5 news-article sentences. Hand-check precision.
Medium. Use REBEL (or a small LLM) on the same sentences. Compare triples. Which extractor has higher precision? Higher recall?
Hard. Build the AEVS pipeline: extract with LLM + verify spans against source. Measure hallucination rate before vs after the verify step on 50 Wikipedia-style sentences.

Key Terms

Term	What people say	What it actually means
Triple	Subject-relation-object	`(s, r, o)` tuple that is the atomic unit of a KG.
Open IE	Extract anything	Open-vocabulary relation phrases; high recall, low precision.
Closed ontology	Fixed schema	Bounded set of relation types (Wikidata, UMLS, FIBO).
Canonicalization	Normalize everything	Map surface names / relations to canonical ids.
AEVS	Grounded extraction	Anchor-Extraction-Verification-Supplement pipeline (2026).
Provenance	Source-of-truth link	Every triple carries a doc id + char-span to its source.
Distant supervision	Cheap labels	Align text with an existing KG to create training data.