Transliterate

This is the help file for the Transliterate option under the Tool menu.
This feature transliterates selected text between Hmong RPA (used in Southeast Asia) and Danashan Miao (used in China), and vice versa.

Definition: According to the Cambridge Dictionary, transliterate means to write words using a different alphabet. Webster’s Dictionary defines it as representing or spelling words using the characters of another alphabet.

Note: For simplicity in this help file, Danashan refers to the Danashan Miao script, and RPA refers to the Hmong RPA script used in Southeast Asia.

Due to limited available reference materials, this transliteration may not be fully accurate or complete. Only words containing vowels and consonants common to both scripts can be transliterated. For large documents, some words may fail and require manual editing. Failed words are clearly marked, as described below.

To simplify updates to the Danashan transliteration rules, the software loads dictionary data from external files instead of hard-coding them. These files are stored in the Dictionary folder:
  1. DanashanConsonant.dic
  2. DanashanVowel.dic
  3. DanashanTone.dic

Why These Dictionary Files?

Using external dictionary files allows advanced users to update transliteration rules without modifying the software itself. If these mappings were hard-coded, any change would require recompiling, redistributing, and reinstalling the software, which is time-consuming.

However, users must understand the structure of these files. Incorrect modifications may result in unexpected transliteration output.

Descriptions

Each dictionary file contains paired values separated by a colon (:). The left side represents Danashan, and the right side represents RPA.

Each time the software launches, it reads these files and builds internal dictionaries for consonants, vowels, tone markers, and all possible monosyllabic words.

The following internal dictionaries are used during transliteration:
  1. Consonant pairs:
  2. Vowel pairs:
  3. Tone marker pairs:
  4. Monosyllabic word pairs (up to ~6,700 words each):

Script Commands

A document may contain a mixture of Danashan and RPA text by using script commands. These commands instruct the software which dictionaries to use during transliteration. Without them, many words may fail to transliterate correctly.

Available script commands:
  1. <Dialect:Dawb> — White Hmong: number reading and transliteration
  2. <Dialect:Leeg> — Blue Hmong: number reading and transliteration
  3. <Dialect:Shib> — Blue Hmong numbers and Danashan transliteration
Example:

<Dialect:Leeg>:
Noav yog kawm txug Moob Soav cov ntawv hab cov lug roa cov toabNeeg nyob saab nub poob.
Cov ntawv noav moab lug ntawm 'tshav ntuj kaj nrig' nyob huv Youtube.

<Dialect:Shib>:
Uat gaox zhous.
Zhit yongf uat zhous.
Uat zhous ndout ndout.
Nyaob rongt (Gaox rongt).
Sheud nzod nyaob rongt.
Gaox duax lak?
Gaox nyaob rongt let jangl?
God yat mol del.

Transliteration

Transliteration is performed using fast search-and-replace operations and depends on the size of the selected text.

Binary Search: The software uses binary search on sorted dictionaries, repeatedly dividing the search range in half until a match is found or fails.

Linear Search: Linear search scans from the beginning of a list and is significantly slower for large datasets. This method is avoided whenever possible.

Processing Steps:
  1. The software reads one word at a time, separated by whitespace.
  2. The word type is identified (web link, email, number, valid word, etc.).
  3. The word is searched in the appropriate sorted dictionary.
  4. If found, the index is used to retrieve the matching word from the paired unsorted dictionary.
  5. The original word is replaced, and the process repeats until complete.
Notes:
  1. Web links, emails, currencies, numbers, and similar items are not transliterated.
  2. When reading Danashan text aloud, the software uses recorded RPA audio.

To use this feature, select Tool from the menu bar, then choose Transliterate, as shown below.



Procedure

This process does not modify the text in the main document.
Text can be selected using the keyboard or mouse. Use Ctrl+A to select all text.
Be sure to include the appropriate script command in your selection to minimize failed words.

  1. Click outside this popup window to focus on the main document.
  2. Select the desired text.
  3. Click the Get Text button to load the selection into this window.
  4. Select the target script using the Transliterate to: radio buttons (Southeast or Danashan).
  5. Click the Process button to transliterate the text.
    Note: Failed words are prefixed with Failed and also listed in a separate popup.
  6. Click Save to store the transliterated text in a file. This file may be loaded into the main document and read using the appropriate script command.
  7. Repeat the process to transliterate additional text.
  8. Click Help to view this help file.
  9. Click Close to exit when finished.


File name: Transliterate.html
Date: 12/15/2025