npm stats
  • Search
  • About
  • Repo
  • Sponsor
  • more
    • Search
    • About
    • Repo
    • Sponsor

Made by Antonio Ramirez

unicode-to-plain-text

2.2.0

@GitHub Actions

npmHomeRepoSnykSocket
Downloads:227
$ npm install unicode-to-plain-text
DailyWeeklyMonthlyYearly

unicode-to-plain-text

Convert fancy Unicode text to plain ASCII with smart language preservation

Install

npm i unicode-to-plain-text

Usage

Basic usage:

import { toPlainText } from 'unicode-to-plain-text'

// Mathematical styles
toPlainText('๐‡๐ž๐ฅ๐ฅ๐จ ๐–๐จ๐ซ๐ฅ๐') // => 'Hello World'

// Enclosed characters
toPlainText('๐Ÿ…ฃ๐Ÿ…”๐Ÿ…ข๐Ÿ…ฃ') // => 'TEST'

// Fullwidth forms
toPlainText('๏ผจ๏ผฅ๏ผฌ๏ผฌ๏ผฏ') // => 'HELLO'

Language preservation:

// Preserve specific writing systems
toPlainText('ะŸั€ะธะฒะตั‚ ๐–๐จ๐ซ๐ฅ๐', { preserve: ['cyrillic'] }) // => 'ะŸั€ะธะฒะตั‚ World'
toPlainText('ไฝ ๅฅฝ ๐–๐จ๐ซ๐ฅ๐', { preserve: ['cjk'] })       // => 'ไฝ ๅฅฝ World'

// But lookalike characters are converted when not preserved
toPlainText('ฮ‘ test')  // => 'A test' (Greek Alpha โ†’ Latin A)

Strip characters that render as boxes in iOS notifications:

import { stripBoxChars } from 'unicode-to-plain-text'

stripBoxChars('Hello ๐‡จ World')   // => 'Hello  World'   (musical symbol stripped)
stripBoxChars('Hi ู…ุฑุญุจุง ๐ŸŽ‰')     // => 'Hi ู…ุฑุญุจุง ๐ŸŽ‰'    (Arabic and emoji kept)
stripBoxChars('๐€€๐ˆ€ test')        // => ' test'           (Byzantine/Greek musical stripped)

Sanitize user input:

import { sanitize } from 'unicode-to-plain-text'

// Clean and validate display names
sanitize('๐‰๐จ๐ก๐ง ๐ƒ๐จ๐ž')
// => { text: 'John Doe', valid: true, length: 8 }

sanitize('   ')
// => { text: '', valid: false, length: 0, error: 'whitespace_only' }

sanitize('AB', { minLength: 3 })
// => { text: 'AB', valid: false, length: 2, error: 'too_short' }

Custom pipelines:

import {
  pipe,
  normalizeFlipped,
  convertCharacters,
  normalizeDiacritics,
  normalizeDecorations,
  normalizeCasing
} from 'unicode-to-plain-text'

// Create a custom pipeline
const customTransform = pipe(
  normalizeFlipped,
  convertCharacters,
  normalizeDiacritics,
  normalizeDecorations
)

const result = customTransform('๐“๐„๐’๐“')

API

toPlainText(text, options?)

Converts fancy Unicode text to plain ASCII

PropertyTypeDescription
textstringInput text with Unicode characters
optionsobjectOptional configuration object

Options

OptionTypeDefaultDescription
normalizeSpacesbooleantrueCollapse spaces, convert whitespace, strip invisible
skipEmojibooleanfalsePreserve emoji characters
preservePreserveOption[]Writing systems to preserve: 'all' or array
trimTrimOption'all'Trim mode: 'all', 'start', 'end', or 'none'
asciiOnlybooleanfalseStrip all non-ASCII characters after normalization

PreserveOption: 'all' | WritingSystem[]

WritingSystem: 'latin' | 'greek' | 'cyrillic' | 'arabic' | 'hebrew' | 'cjk' | 'ethiopic' | 'thai' | 'devanagari'

Note: 'latin' preserves all European language diacritics (French, German, Spanish, Portuguese, Polish, Czech, Hungarian, Romanian, Turkish, Nordic, etc.)

Examples

// Default behavior - emojis removed
toPlainText('Hello ๐ŸŽ‰ World') // => 'Hello World'

// Preserve emojis
toPlainText('Hello ๐ŸŽ‰ World', { skipEmoji: true }) // => 'Hello ๐ŸŽ‰ World'

// Preserve spacing
toPlainText('Hello   World', { normalizeSpaces: false }) // => 'Hello   World'

// Preserve specific writing systems
toPlainText('ืฉืœื•ื ๐–๐จ๐ซ๐ฅ๐', { preserve: ['hebrew'] }) // => 'ืฉืœื•ื World'

// Preserve all writing systems
toPlainText('ฮบฮฑฮปฮทฮผฮญฯฮฏ', { preserve: 'all' }) // => 'ฮบฮฑฮปฮทฮผฮญฯฮฏ'

// Preserve European diacritics (French, German, Turkish, Polish, etc.)
toPlainText('ร–mer ลžahin', { preserve: ['latin'] }) // => 'ร–mer ลžahin'
toPlainText('Franรงois Mรผller', { preserve: ['latin'] }) // => 'Franรงois Mรผller'

// Control trimming
toPlainText('  hello  ', { trim: 'start' }) // => 'hello '
toPlainText('  hello  ', { trim: 'none' })  // => ' hello '

sanitize(text, options?)

Sanitizes and validates text for use as display names

PropertyTypeDescription
textstringInput text to sanitize
optionsobjectOptional configuration object

Options

OptionTypeDefaultDescription
minLengthnumber1Minimum length count
maxLengthnumber64Maximum length count
preserveWritingSystem[]-Writing systems to preserve
skipEmojibooleanfalsePreserve emoji characters
trimTrimOption'all'Trim mode
truncatebooleanfalseAuto-truncate to maxLength instead of error
allowedWritingSystemsWritingSystem[]-Strict whitelist of allowed writing systems
rejectHomoglyphsbooleanfalseFail validation if homoglyphs detected
asciiOnlybooleanfalseEnsure output contains only ASCII

Returns

{
  text: string        // Sanitized text
  valid: boolean      // Whether validation passed
  length: number
  error?: 'empty' | 'too_short' | 'too_long' | 'whitespace_only' | 'homoglyphs' | 'disallowed_writing_system'
}

Detection Utilities

  • detectWritingSystems(text) - Returns { writingSystems: WritingSystem[], mixed: boolean, primary: WritingSystem | null }
  • getWritingSystem(char) - Returns the writing system for a single character
  • hasHomoglyphs(text) - Checks if text contains cross-script visually confusable characters
  • analyzeHomoglyphs(text) - Returns detailed analysis of homoglyph matches
  • isSuspiciousMix(text) - Checks for Latin + Cyrillic/Greek mixing (common spoofing)

stripBoxChars(text)

Strips Unicode characters that render as rectangle boxes in iOS notifications. Safe for multilingual content โ€” preserves Arabic, Thai, CJK, Cyrillic, Hebrew, Armenian, Hiragana, Hangul, emoji, and math alphanumeric symbols.

stripBoxChars('Hello ๐‡จ World')       // => 'Hello  World'
stripBoxChars('ู…ุฑุญุจุง ๐ŸŽ‰')            // => 'ู…ุฑุญุจุง ๐ŸŽ‰'    (untouched)
stripBoxChars('๐‡๐ž๐ฅ๐ฅ๐จ')             // => '๐‡๐ž๐ฅ๐ฅ๐จ'       (math bold kept)
stripBoxChars('๐ŸŸ ๐ŸŸก status')         // => '๐ŸŸ ๐ŸŸก status'  (colored circles kept)

Stripped ranges include: Private Use Area, Byzantine/Ancient Greek Musical Symbols, Tangut, Khitan Small Script, historical Japanese kana (Kana Supplement/Extended), Sutton SignWriting, and 40+ other blocks confirmed unrenderable on iOS.

Individual Functions

  • normalizeFlipped(text) - Handles upside-down and mirrored text
  • convertCharacters(text, options?) - Maps Unicode to ASCII equivalents
  • normalizeDiacritics(text) - Removes diacritics from Latin text
  • normalizeDecorations(text, options?) - Removes emojis and decorations
  • decodeUnicodeId(text) - Converts comma-separated code points to string
  • normalizeSpaces(text, options?) - Normalizes whitespace with trim control
  • normalizeCasing(text) - Normalizes inconsistent casing
  • pipe(...fns) - Composes functions into a pipeline
  • pipeWith(...fns) - Pipe with shared context
  • when(fn, condition) - Conditional pipeline step

License

Apache-2.0