$ npm install unicode-to-plain-textConvert fancy Unicode text to plain ASCII with smart language preservation
npm i unicode-to-plain-text
Basic usage:
import { toPlainText } from 'unicode-to-plain-text'
// Mathematical styles
toPlainText('𝐇𝐞𝐥𝐥𝐨 𝐖𝐨𝐫𝐥𝐝') // => 'Hello World'
// Enclosed characters
toPlainText('🅣🅔🅢🅣') // => 'TEST'
// Fullwidth forms
toPlainText('HELLO') // => 'HELLO'
Language preservation:
// Preserve specific writing systems
toPlainText('Привет 𝐖𝐨𝐫𝐥𝐝', { preserve: ['cyrillic'] }) // => 'Привет World'
toPlainText('你好 𝐖𝐨𝐫𝐥𝐝', { preserve: ['cjk'] }) // => '你好 World'
// But lookalike characters are converted when not preserved
toPlainText('Α test') // => 'A test' (Greek Alpha → Latin A)
Sanitize user input:
import { sanitize } from 'unicode-to-plain-text'
// Clean and validate display names
sanitize('𝐉𝐨𝐡𝐧 𝐃𝐨𝐞')
// => { text: 'John Doe', valid: true, length: 8 }
sanitize(' ')
// => { text: '', valid: false, length: 0, error: 'whitespace_only' }
sanitize('AB', { minLength: 3 })
// => { text: 'AB', valid: false, length: 2, error: 'too_short' }
Custom pipelines:
import {
pipe,
normalizeFlipped,
convertCharacters,
normalizeDiacritics,
normalizeDecorations,
normalizeCasing
} from 'unicode-to-plain-text'
// Create a custom pipeline
const customTransform = pipe(
normalizeFlipped,
convertCharacters,
normalizeDiacritics,
normalizeDecorations
)
const result = customTransform('𝐓𝐄𝐒𝐓')
Converts fancy Unicode text to plain ASCII
| Property | Type | Description |
|---|---|---|
text | string | Input text with Unicode characters |
options | object | Optional configuration object |
| Option | Type | Default | Description |
|---|---|---|---|
normalizeSpaces | boolean | true | Collapse spaces, convert whitespace, strip invisible |
skipEmoji | boolean | false | Preserve emoji characters |
preserve | PreserveOption | [] | Writing systems to preserve: 'all' or array |
trim | TrimOption | 'all' | Trim mode: 'all', 'start', 'end', or 'none' |
asciiOnly | boolean | false | Strip all non-ASCII characters after normalization |
PreserveOption: 'all' | WritingSystem[]
WritingSystem: 'latin' | 'greek' | 'cyrillic' | 'arabic' | 'hebrew' | 'cjk' | 'ethiopic' | 'thai' | 'devanagari'
Note:
'latin'preserves all European language diacritics (French, German, Spanish, Portuguese, Polish, Czech, Hungarian, Romanian, Turkish, Nordic, etc.)
// Default behavior - emojis removed
toPlainText('Hello 🎉 World') // => 'Hello World'
// Preserve emojis
toPlainText('Hello 🎉 World', { skipEmoji: true }) // => 'Hello 🎉 World'
// Preserve spacing
toPlainText('Hello World', { normalizeSpaces: false }) // => 'Hello World'
// Preserve specific writing systems
toPlainText('שלום 𝐖𝐨𝐫𝐥𝐝', { preserve: ['hebrew'] }) // => 'שלום World'
// Preserve all writing systems
toPlainText('καλημέρί', { preserve: 'all' }) // => 'καλημέρί'
// Preserve European diacritics (French, German, Turkish, Polish, etc.)
toPlainText('Ömer Şahin', { preserve: ['latin'] }) // => 'Ömer Şahin'
toPlainText('François Müller', { preserve: ['latin'] }) // => 'François Müller'
// Control trimming
toPlainText(' hello ', { trim: 'start' }) // => 'hello '
toPlainText(' hello ', { trim: 'none' }) // => ' hello '
Sanitizes and validates text for use as display names
| Property | Type | Description |
|---|---|---|
text | string | Input text to sanitize |
options | object | Optional configuration object |
| Option | Type | Default | Description |
|---|---|---|---|
minLength | number | 1 | Minimum length count |
maxLength | number | 64 | Maximum length count |
preserve | WritingSystem[] | - | Writing systems to preserve |
skipEmoji | boolean | false | Preserve emoji characters |
trim | TrimOption | 'all' | Trim mode |
truncate | boolean | false | Auto-truncate to maxLength instead of error |
allowedWritingSystems | WritingSystem[] | - | Strict whitelist of allowed writing systems |
rejectHomoglyphs | boolean | false | Fail validation if homoglyphs detected |
asciiOnly | boolean | false | Ensure output contains only ASCII |
{
text: string // Sanitized text
valid: boolean // Whether validation passed
length: number
error?: 'empty' | 'too_short' | 'too_long' | 'whitespace_only' | 'homoglyphs' | 'disallowed_writing_system'
}
detectWritingSystems(text) - Returns { writingSystems: WritingSystem[], mixed: boolean, primary: WritingSystem | null }getWritingSystem(char) - Returns the writing system for a single characterhasHomoglyphs(text) - Checks if text contains cross-script visually confusable charactersanalyzeHomoglyphs(text) - Returns detailed analysis of homoglyph matchesisSuspiciousMix(text) - Checks for Latin + Cyrillic/Greek mixing (common spoofing)normalizeFlipped(text) - Handles upside-down and mirrored textconvertCharacters(text, options?) - Maps Unicode to ASCII equivalentsnormalizeDiacritics(text) - Removes diacritics from Latin textnormalizeDecorations(text, options?) - Removes emojis and decorationsdecodeUnicodeId(text) - Converts comma-separated code points to stringnormalizeSpaces(text, options?) - Normalizes whitespace with trim controlnormalizeCasing(text) - Normalizes inconsistent casingpipe(...fns) - Composes functions into a pipelinepipeWith(...fns) - Pipe with shared contextwhen(fn, condition) - Conditional pipeline stepApache-2.0