Why Unicode 16/17 matter to typing tests
If your test counts code points or UTF-16 units, you’re short‑changing users typing emoji, Indic scripts, or composed characters. Unicode 16.0 (released September 10, 2024) added 5,185 characters and seven new scripts; Unicode 17.0 (released September 9, 2025) added 4,803 characters and four new scripts, plus updates to annexes and synchronized standards. Those dates and numbers aren’t trivia—they change segmentation, display, and what users can enter or even see. (unicode.org)
Highlights you’ll feel in a typing app:
- New scripts (16.0: Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar, Todhri, Tulu‑Tigalari; 17.0: Sidetic, Tolong Siki, Beria Erfe, Tai Yo) expand multilingual prompts and require cluster‑aware selection and counting. (unicode.org)
- Emoji sets moved forward: Emoji 16.0 introduced eight new emoji (including Flag of Sark); Emoji 17.0 introduces 163 additions when you include gender and skin‑tone variants (seven new base characters like Distorted Face, Orca, Trombone, Landslide, Treasure Chest, Hairy Creature, Fight Cloud). (unicode.org)
Grapheme clusters 101 (and why your counter is wrong)
Users perceive a “character” as a grapheme—often multiple code points. Unicode Standard Annex #29 (UAX #29) defines default grapheme cluster boundaries and recommends extended grapheme clusters for editing, selection, and counting. Practically: one backspace should delete one grapheme cluster, not just the last code point. (unicode.org)
What changed or matters now:
- Extended clusters include spacing marks common in Indic scripts; editing by cluster matches user expectations better than legacy clusters. (unicode.org)
- Specific rules protect emoji sequences: GB11 prevents breaks within ZWJ emoji sequences; GB12/GB13 keep regional‑indicator pairs together (flags), so a single flag is one cluster. UAX #29 explicitly notes that “each emoji sequence is a single grapheme cluster.” (unicode.org)
- Unicode 17.0’s UAX #29 edition (Aug 17, 2025) is the version to align with if you claim “Unicode‑17 compatible” typing behavior. (unicode.org)
Emoji 16.0/17.0: more than just pictures
- Emoji 16.0 (synchronized with Unicode 16.0) added eight characters, including face with bags under eyes, fingerprint, root vegetable, harp, shovel, splatter, leafless tree, and Flag of Sark. These are officially counted in the Emoji 16.0 “Recently Added” chart. (unicode.org)
- Emoji 17.0 ramps up combinations: 163 total additions come from seven new base characters plus many new recommended gendered and skin‑tone sequences (for ballet dancer, people with bunny ears, wrestlers, and more). The “Recently Added v17.0” chart is your ground truth for what gets counted. (unicode.org)
- UTS #51 (Unicode Emoji) defines emoji modifier sequences (skin tones), flag sequences, and ZWJ sequences, and it’s synchronized with the Unicode Standard. If your test splits those apart, you’ll miscount, mis‑backspace, and mis‑score. (unicode.org)
Staggered OS rollouts: what your users can input or see
Even if Unicode approves a character, users get it when platforms ship fonts, keyboards, and UI updates.
- Apple shipped Emoji 16.0 to end users with iOS 18.4 (released March 31, 2025), listing “8 new emojis” in Apple’s release notes. If a participant’s iPhone wasn’t updated, they might see tofu boxes or fallback glyphs. (support.apple.com)
- Microsoft began rolling out Emoji 16.0 in Windows 11 during the September 2025 Patch Tuesday—initially supported in some apps but not everywhere; Microsoft also omitted the Flag of Sark. Expect mixed support mid‑rollout. (windowscentral.com)
- Android: Google’s support for Emoji 16.0 rolled out across 2025 via Noto Color Emoji and product updates; Emoji 17.0 test‑drives appeared for Android 16 beta users ahead of broad 2026 availability. Your cross‑platform test should not assume every device shows the same glyphs the same week. (blog.emojipedia.org)
Backspace semantics that feel right
Users expect backspace to remove what they see as one character. UAX #29 is explicit that grapheme clusters are the atomic unit for selection, cursor movement, and editing operations; it even notes that some systems delete by code point while others delete by cluster—your test should be explicit and consistent. (unicode.org)
Practical rules to adopt:
- Treat any extended grapheme cluster as one “character” for input, deletion, and scoring. That means: one backspace removes the whole flag (two regional indicators), a woman‑technologist with skin tone (base + modifier + ZWJ + object), or a Devanagari consonant + vowel sign cluster. (unicode.org)
- Never split emoji ZWJ sequences (GB11) or regional‑indicator pairs (GB12/GB13). If your keystroke logger shows partial deletions, fix your editor widget or the DOM hooks. (unicode.org)
Concrete test‑design fixes (do these now)
1) Base your counting and scoring on UAX #29 extended grapheme clusters
- Use a standards‑based segmenter rather than hand‑rolled regex. Options include ICU/ICU4X segmenters, JavaScript’s Intl.Segmenter("…", { granularity: "grapheme" }), Swift’s `Character` (already an extended grapheme cluster), or proven libraries in your stack (e.g., Rust `unicode-segmentation`). (home.unicode.org)
- Document your behavior (“backspace deletes one grapheme cluster”) in the test’s help panel.
2) Normalize your emoji baseline
- Align your test content to Emoji 16.0/17.0 RGI sets. Keep a capability matrix per platform to avoid serving prompts users can’t render. Start with the official “Recently Added” charts for 16.0 and 17.0 and the master v17.0 charts. (unicode.org)
- Provide a fallback: if a glyph is unsupported, substitute a visually similar prompt (or skip emoji items) without penalizing the user’s accuracy.
3) Make prompts locale‑ and script‑aware
- Use CLDR to choose representative, well‑formed words/sentences per language, especially for scripts added in 16.0 and 17.0. CLDR also carries segmentation tailorings that improve cursoring/selection for some languages. (cldr.unicode.org)
- For right‑to‑left and complex scripts, test arrow‑key navigation and selection by cluster before deploying new passages.
4) Fair scoring for emoji and complex sequences
- Accuracy should compare cluster sequences, not raw scalars. For example, 👩🏽💻 is one cluster; if the user types the correct cluster, it’s a full match even though it’s multiple code points. UTS #51’s definitions for emoji modifier sequences and flag sequences are your definition of “one symbol.” (unicode.org)
- WPM: compute characters per minute using cluster counts. Consider reporting both CPM (clusters per minute) and words per minute to communicate clearly what you measured. (unicode.org)
5) Instrument for the real world
- Capture platform/OS and font info (with consent) so you can skip prompts a user’s device can’t render yet (e.g., mid‑rollout on Windows 11). Keep a small A/B set to detect unsupported glyphs cheaply. (windowscentral.com)
A quick checklist
- Adopt UAX #29 (Unicode 17.0) extended grapheme clusters for counting, caret movement, and backspace. (unicode.org)
- Align prompts to Unicode 16.0/17.0 and Emoji 16.0/17.0 charts; avoid serving unsupported emoji to older systems. (unicode.org)
- Use standard libraries: ICU/ICU4X, Intl.Segmenter, Swift `Character`, or battle‑tested segmentation libs. (home.unicode.org)
- Treat flags, skin‑tone variants, and ZWJ families as single units for scoring and backspace. (unicode.org)
- Monitor OS rollout notes so your test stays fair as support lands across platforms. (support.apple.com)