Question 1

What is BPE tokenisation and why do LLMs use subword tokenisation?

Accepted Answer

BPE (Byte Pair Encoding) is a subword tokenisation algorithm. It starts with a character-level vocabulary and iteratively merges the most frequent adjacent token pair, building a vocabulary of subword units. For example, 'tokenisation' might be split as ['token', 'isation'] or ['t', 'oken', 'isation'] depending on the frequency of these subwords in the training corpus. Subword tokenisation solves two problems: (1) The unknown token problem — word-level tokenisation fails on out-of-vocabulary words like proper nouns, technical terms, or new words. BPE can always represent any word as a sequence of known subwords down to individual characters. (2) Vocabulary size — character-level tokenisation produces very long sequences (slow and memory-intensive). Subword tokenisation balances sequence length against vocabulary coverage. GPT-4 uses a BPE tokeniser with a 100,277-token vocabulary (tiktoken, cl100k_base). Llama 3 uses a 128,256-token BPE vocabulary.

Question 2

What is the difference between BLEU and ROUGE for NLP evaluation?

Accepted Answer

BLEU (Bilingual Evaluation Understudy) is a precision-based metric: it measures the fraction of n-grams in the model output that appear in the reference text(s). A brevity penalty discourages very short outputs. BLEU is the standard metric for machine translation. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is recall-based: it measures the fraction of reference n-grams that appear in the output. ROUGE-1 (unigram), ROUGE-2 (bigram), and ROUGE-L (longest common subsequence) are the most common variants. ROUGE is the standard for summarisation evaluation. Both metrics have well-known limitations: they measure lexical overlap rather than semantic meaning, so semantically equivalent outputs with different wording score poorly. BERTScore (which uses contextual embeddings to measure semantic similarity) is preferred for nuanced evaluation when compute allows.

Question 3

When should you use spaCy vs HuggingFace Transformers for NLP tasks?

Accepted Answer

spaCy is optimised for production NLP pipelines requiring speed and low latency: rule-based processing, tokenisation, sentence boundary detection, POS tagging, dependency parsing, and NER on CPU. Its pipeline API chains components efficiently, and it integrates well with custom rules and gazetteer-based lookups. Best for applications processing thousands of documents per second. HuggingFace Transformers provides transformer-based models that outperform spaCy on most benchmarks, but at significantly higher computational cost. Use HuggingFace for: state-of-the-art NER on challenging domains (biomedical, legal), semantic similarity and embedding generation, text generation, and zero-shot classification. In production systems, the common pattern is spaCy for fast pre-processing and rules, HuggingFace models for the tasks requiring contextual understanding.

Task	Key Tools	Primary Metric
Text Classification	sklearn, HuggingFace AutoModelForSequenceClassification	F1, Accuracy, AUC-ROC
Named Entity Recognition	spaCy, HuggingFace token-classification pipeline	Entity-level F1 (strict)
Semantic Similarity	Sentence Transformers, text-embedding-3-small	Spearman correlation (STS-B benchmark)
Text Summarisation	BART, PEGASUS, T5 via HuggingFace	ROUGE-1, ROUGE-2, ROUGE-L
Machine Translation	MarianMT, Helsinki-NLP models	BLEU, COMET
Question Answering	HuggingFace question-answering pipeline	Exact Match, F1
Relation Extraction	GLiREL, REBEL (end-to-end)	Micro F1 on relation types
Text Generation	GPT-family, Llama, Mistral via HuggingFace	Perplexity, task-specific, human eval

NLP Engineering Skills
The 2026 Skills Guide

NLP Tasks and Tooling

Tokenisation in Depth

Production NLP with spaCy

Frequently Asked Questions

What is BPE tokenisation and why do LLMs use subword tokenisation?

What is the difference between BLEU and ROUGE?

When should you use spaCy vs HuggingFace Transformers?

What is Named Entity Recognition (NER) and how is it evaluated?

What is semantic similarity and how is it implemented?

Browse NLP Engineering Jobs

Quick Facts

Key Tools

Related Skills

NLP Engineering SkillsThe 2026 Skills Guide

NLP Tasks and Tooling

Tokenisation in Depth

Production NLP with spaCy

Frequently Asked Questions

What is BPE tokenisation and why do LLMs use subword tokenisation?

What is the difference between BLEU and ROUGE?

When should you use spaCy vs HuggingFace Transformers?

What is Named Entity Recognition (NER) and how is it evaluated?

What is semantic similarity and how is it implemented?

Browse NLP Engineering Jobs

Quick Facts

Key Tools

Related Skills

NLP Engineering Skills
The 2026 Skills Guide