npm stats
  • Search
  • About
  • Repo
  • Sponsor
  • more
    • Search
    • About
    • Repo
    • Sponsor

Made by Antonio Ramirez

unicode-to-plain-text

2.1.0

@chocky335

npmHomeRepoSnykSocket
Downloads:22
$ npm install unicode-to-plain-text
DailyWeeklyMonthlyYearly

unicode-to-plain-text

Convert fancy Unicode text to plain ASCII with smart language preservation

Install

npm i unicode-to-plain-text

Usage

Basic usage:

import { toPlainText } from 'unicode-to-plain-text'

// Mathematical styles
toPlainText('𝐇𝐞𝐥𝐥𝐨 𝐖𝐨𝐫𝐥𝐝') // => 'Hello World'

// Enclosed characters
toPlainText('🅣🅔🅢🅣') // => 'TEST'

// Fullwidth forms
toPlainText('HELLO') // => 'HELLO'

Language preservation:

// Preserve specific writing systems
toPlainText('Привет 𝐖𝐨𝐫𝐥𝐝', { preserve: ['cyrillic'] }) // => 'Привет World'
toPlainText('你好 𝐖𝐨𝐫𝐥𝐝', { preserve: ['cjk'] })       // => '你好 World'

// But lookalike characters are converted when not preserved
toPlainText('Α test')  // => 'A test' (Greek Alpha → Latin A)

Sanitize user input:

import { sanitize } from 'unicode-to-plain-text'

// Clean and validate display names
sanitize('𝐉𝐨𝐡𝐧 𝐃𝐨𝐞')
// => { text: 'John Doe', valid: true, length: 8 }

sanitize('   ')
// => { text: '', valid: false, length: 0, error: 'whitespace_only' }

sanitize('AB', { minLength: 3 })
// => { text: 'AB', valid: false, length: 2, error: 'too_short' }

Custom pipelines:

import {
  pipe,
  normalizeFlipped,
  convertCharacters,
  normalizeDiacritics,
  normalizeDecorations,
  normalizeCasing
} from 'unicode-to-plain-text'

// Create a custom pipeline
const customTransform = pipe(
  normalizeFlipped,
  convertCharacters,
  normalizeDiacritics,
  normalizeDecorations
)

const result = customTransform('𝐓𝐄𝐒𝐓')

API

toPlainText(text, options?)

Converts fancy Unicode text to plain ASCII

PropertyTypeDescription
textstringInput text with Unicode characters
optionsobjectOptional configuration object

Options

OptionTypeDefaultDescription
normalizeSpacesbooleantrueCollapse spaces, convert whitespace, strip invisible
skipEmojibooleanfalsePreserve emoji characters
preservePreserveOption[]Writing systems to preserve: 'all' or array
trimTrimOption'all'Trim mode: 'all', 'start', 'end', or 'none'
asciiOnlybooleanfalseStrip all non-ASCII characters after normalization

PreserveOption: 'all' | WritingSystem[]

WritingSystem: 'latin' | 'greek' | 'cyrillic' | 'arabic' | 'hebrew' | 'cjk' | 'ethiopic' | 'thai' | 'devanagari'

Note: 'latin' preserves all European language diacritics (French, German, Spanish, Portuguese, Polish, Czech, Hungarian, Romanian, Turkish, Nordic, etc.)

Examples

// Default behavior - emojis removed
toPlainText('Hello 🎉 World') // => 'Hello World'

// Preserve emojis
toPlainText('Hello 🎉 World', { skipEmoji: true }) // => 'Hello 🎉 World'

// Preserve spacing
toPlainText('Hello   World', { normalizeSpaces: false }) // => 'Hello   World'

// Preserve specific writing systems
toPlainText('שלום 𝐖𝐨𝐫𝐥𝐝', { preserve: ['hebrew'] }) // => 'שלום World'

// Preserve all writing systems
toPlainText('καλημέρί', { preserve: 'all' }) // => 'καλημέρί'

// Preserve European diacritics (French, German, Turkish, Polish, etc.)
toPlainText('Ömer Şahin', { preserve: ['latin'] }) // => 'Ömer Şahin'
toPlainText('François Müller', { preserve: ['latin'] }) // => 'François Müller'

// Control trimming
toPlainText('  hello  ', { trim: 'start' }) // => 'hello '
toPlainText('  hello  ', { trim: 'none' })  // => ' hello '

sanitize(text, options?)

Sanitizes and validates text for use as display names

PropertyTypeDescription
textstringInput text to sanitize
optionsobjectOptional configuration object

Options

OptionTypeDefaultDescription
minLengthnumber1Minimum length count
maxLengthnumber64Maximum length count
preserveWritingSystem[]-Writing systems to preserve
skipEmojibooleanfalsePreserve emoji characters
trimTrimOption'all'Trim mode
truncatebooleanfalseAuto-truncate to maxLength instead of error
allowedWritingSystemsWritingSystem[]-Strict whitelist of allowed writing systems
rejectHomoglyphsbooleanfalseFail validation if homoglyphs detected
asciiOnlybooleanfalseEnsure output contains only ASCII

Returns

{
  text: string        // Sanitized text
  valid: boolean      // Whether validation passed
  length: number
  error?: 'empty' | 'too_short' | 'too_long' | 'whitespace_only' | 'homoglyphs' | 'disallowed_writing_system'
}

Detection Utilities

  • detectWritingSystems(text) - Returns { writingSystems: WritingSystem[], mixed: boolean, primary: WritingSystem | null }
  • getWritingSystem(char) - Returns the writing system for a single character
  • hasHomoglyphs(text) - Checks if text contains cross-script visually confusable characters
  • analyzeHomoglyphs(text) - Returns detailed analysis of homoglyph matches
  • isSuspiciousMix(text) - Checks for Latin + Cyrillic/Greek mixing (common spoofing)

Individual Functions

  • normalizeFlipped(text) - Handles upside-down and mirrored text
  • convertCharacters(text, options?) - Maps Unicode to ASCII equivalents
  • normalizeDiacritics(text) - Removes diacritics from Latin text
  • normalizeDecorations(text, options?) - Removes emojis and decorations
  • decodeUnicodeId(text) - Converts comma-separated code points to string
  • normalizeSpaces(text, options?) - Normalizes whitespace with trim control
  • normalizeCasing(text) - Normalizes inconsistent casing
  • pipe(...fns) - Composes functions into a pipeline
  • pipeWith(...fns) - Pipe with shared context
  • when(fn, condition) - Conditional pipeline step

License

Apache-2.0