Skip to main content
  • New archived content: Talend MDM, Talend Data Catalog 8.0, and Talend 7.3 products reached their end of life in 2024. Their documentation was moved to the Talend Archive page and will no longer receive content updates.
Close announcements banner

Text standardization components

tJapaneseNumberNormalize Normalizes Japanese numbers (kansūji) to regular Arabic numbers.
tJapaneseTokenize Splits Japanese text into tokens.
tJapaneseTransliterate Converts textual data in Japanese to kana and Latin scripts.
tStem Enables to standardize data in columns before matching this data.
tTransliterate Converts strings from many languages of the world to a standard set of characters (Universal Coded Character Set, UCS).

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!