transliterate¶
Bi-directional transliterator for Python. Transliterates (unicode) strings according to the rules specified in the language packs (source script <-> target script).
Comes with language packs for the following languages (listed in alphabetical order):
- Armenian
- Bulgarian (beta)
- Georgian
- Greek
- Macedonian (alpha)
- Mongolian (alpha)
- Russian
- Serbian (alpha)
- Ukrainian (beta)
There are also a number of useful tools included, such as:
- Simple lorem ipsum generator, which allows lorem ipsum generation in the language chosen.
- Language detection for the text (if appropriate language pack is available).
- Slugify function for non-latin texts.
Prerequisites¶
- Python >=2.7, >=3.4, PyPy
Installation¶
Install with latest stable version from PyPI.
pip install transliterate
or install the latest stable version from BitBucket:
pip install https://bitbucket.org/barseghyanartur/transliterate/get/stable.tar.gz
or install the latest stable version from GitHub:
pip install https://github.com/barseghyanartur/transliterate/archive/stable.tar.gz
That’s all. See the Usage and examples section for more.
Usage and examples¶
Simple usage¶
Required imports
from transliterate import translit, get_available_language_codes
Original text
text = "Lorem ipsum dolor sit amet"
Transliteration to Armenian
print(translit(text, 'hy'))
# Լօրեմ իպսում դօլօր սիտ ամետ
Transliteration to Georgian
print(translit(text, 'ka'))
# ლორემ იპსუმ დოლორ სით ამეთ
Transliteration to Greek
print(translit(text, 'el'))
# Λορεμ ιψθμ δολορ σιτ αμετ
Transliteration to Russian
print(translit(text, 'ru'))
# Лорем ипсум долор сит амет
List of available (registered) languages
print(get_available_language_codes())
# ['el', 'hy', 'ka', 'ru']
Reversed transliterations are transliterations made from target language to
source language (in terms they are defined in language packs). In case of
reversed transliterations, you may leave out the language_code
attribute,
although if you know it on beforehand, specify it since it works faster that
way.
Reversed transliteration from Armenian
print(translit(u"Լօրեմ իպսում դօլօր սիտ ամետ", 'hy', reversed=True))
# Lorem ipsum dolor sit amet
Reversed transliteration from Armenian with language_code
argument left out
print(translit(u"Լօրեմ իպսում դօլօր սիտ ամետ", reversed=True))
# Lorem ipsum dolor sit amet
Reversed transliteration from Georgian
print(translit(u"ლორემ იპსუმ დოლორ სით ამეთ", 'ka', reversed=True))
# Lorem ipsum dolor sit amet
Reversed transliteration from Georgian with language_code
argument left out
print(translit(u"ლორემ იპსუმ დოლორ სით ამეთ", reversed=True))
# Lorem ipsum dolor sit amet
Reversed transliteration from Greek
print(translit(u"Λορεμ ιψθμ δολορ σιτ αμετ", 'el', reversed=True))
# Lorem ipsum dolor sit amet
Reversed transliteration from Greek with language_code
argument left out
print(translit(u"Λορεμ ιψθμ δολορ σιτ αμετ", reversed=True))
# Lorem ipsum dolor sit amet
Reversed transliteration from Russian (Cyrillic)
print(translit(u"Лорем ипсум долор сит амет", 'ru', reversed=True))
# Lorеm ipsum dolor sit amеt
Reversed transliteration from Russian (Cyrillic) with language_code
argument left out
print(translit(u"Лорем ипсум долор сит амет", reversed=True))
# Lorem ipsum dolor sit amet
Testing the decorator
from transliterate.decorators import transliterate_function
@transliterate_function(language_code='hy')
def decorator_test(text):
return text
print(decorator_test(u"Lorem ipsum dolor sit amet"))
# Լօրեմ իպսում դօլօր սիտ ամետ
Working with large amounts of data¶
If you know which language pack shall be used for transliteration, especially when working with large amounts of data, it makes sense to get the transliteration function in the following way:
from transliterate import get_translit_function
translit_hy = get_translit_function('hy')
print(translit_hy(u"Լօրեմ իպսում դօլօր սիտ ամետ", reversed=True))
# Lorem ipsum dolor sit amet
print(translit_hy(u"Lorem ipsum dolor sit amet"))
# Լօրեմ իպսում դօլօր սիտ ամետ
Registering a custom language pack¶
Basics¶
Make sure to call the autodiscover
function before registering your own
language packs if you want to use the bundled language packs along with your
own custom ones.
from transliterate.discover import autodiscover
autodiscover()
Then the custom language pack part comes.
from transliterate.base import TranslitLanguagePack, registry
class ExampleLanguagePack(TranslitLanguagePack):
language_code = "example"
language_name = "Example"
mapping = (
u"abcdefghij",
u"1234567890",
)
registry.register(ExampleLanguagePack)
print(get_available_language_codes())
# ['el', 'hy', 'ka', 'ru', 'example']
print(translit(text, 'example'))
# Lor5m 9psum 4olor s9t 1m5t
It’s possible to replace existing language packs with your own ones. By default, existing language packs are not force-installed.
To force install a language pack, set the force
argument to True when
registering a language pack. In that case, if a language pack with same
language code has already been registered, it will be replaced; otherwise,
if language pack didn’t exist in the registry, it will be just registered.
registry.register(ExampleLanguagePack, force=True)
Forced language packs can’t be replaced or unregistered.
API in depth¶
There are 7 class properties that you could/should be using in your language pack, of which 4 are various sorts of mappings.
Mappings¶
mapping
(tuple): A tuple of two strings, that simply represent the mapping of characters from the source language to the target language. For example, if your source language is Latin and you want to convert “a”, “b”, “c”, “d” and “e” characters to appropriate characters in Russian Cyrillic, your mapping would look as follows:mapping = (u"abcde", u"абцде")
Example (taken from the Greek language pack).
mapping = ( u"abgdezhiklmnxoprstyfwuABGDEZHIKLMNXOPRSTYFWU", u"αβγδεζηικλμνξοπρστυφωθΑΒΓΔΕΖΗΙΚΛΜΝΞΟΠΡΣΤΥΦΩΘ", )
reversed_specific_mapping
(tuple): When making reversed translations, themapping
property is still used, but in some cases you need to provide additional rules. This property (reversed_specific_mapping
) is meant for such cases. Further, is alike themapping
.Example (taken from the Greek language pack).
reversed_specific_mapping = ( u"θΘ", u"uU" )
pre_processor_mapping
(dict): A dictionary of mapping from source language to target language. Use this only in cases if a single character in source language shall be represented by more than one character in the target language.Example (taken from the Greek language pack).
pre_processor_mapping = { u"th": u"θ", u"ch": u"χ", u"ps": u"ψ", u"TH": u"Θ", u"CH": u"Χ", u"PS": u"Ψ", }
reversed_specific_pre_processor_mapping
: Same aspre_processor_mapping
, but used in reversed translations.Example (taken from the Armenian language pack)
reversed_specific_pre_processor_mapping = { u"ու": u"u", u"Ու": u"U" }
Additional¶
character_ranges
(tuple): A tuple of character ranges (unicode table). Used in language detection. Works only ifdetectable
property is set to True. Be aware, that language (or shall I better be saying - script) detection is very basic and is based on characters only.detectable
(bool): If set to True, language pack would be used for automatic language detection.
Using the lorem ipsum generator¶
Note, that due to incompatibility of the original lorem-ipsum-generator package with Python 3, when used with Python 3 transliterate uses its’ own simplified fallback lorem ipsum generator (which still does the job).
Required imports
from transliterate.contrib.apps.translipsum import TranslipsumGenerator
Generating paragraphs in Armenian
g_am = TranslipsumGenerator(language_code='hy')
print(g_am.generate_paragraph())
# Մագնա տրիստիքուե ֆաուցիբուս ֆամես նետուս նետուս օրցի մաուրիս,
# սուսցիպիտ. Դապիբուս րիսուս սեդ ադիպիսցինգ դիցտում. Ֆերմենտում ուրնա
# նատօքուե ատ. Uլտրիցես եգետ, տացիտի. Լիտօրա ցլասս ցօնուբիա պօսուերե
# մալեսուադա ին իպսում իդ պեր վե.
Generating sentense in Georgian
g_ka = TranslipsumGenerator(language_code='ka')
print(g_ka.generate_sentence())
# გგეთ ყუამ არსუ ვულფუთათე რუთრუმ აუთორ.
Generating sentense in Greek
g_el = TranslipsumGenerator(language_code='el')
print(g_el.generate_sentence())
# Νεc cρασ αμετ, ελιτ vεστιβθλθμ εθ, αενεαν ναμ, τελλθσ vαριθσ.
Generating sentense in Russian (Cyrillic)
g_ru = TranslipsumGenerator(language_code='ru')
print(g_ru.generate_sentence())
# Рисус cонсеcтетуер, фусcе qуис лаореет ат ерос пэдэ фелис магна.
Language detection¶
Required imports
from transliterate import detect_language
Detect Armenian text
detect_language(u'Լօրեմ իպսում դօլօր սիտ ամետ')
# hy
Detect Georgian text
detect_language(u'ლორემ იპსუმ დოლორ სით ამეთ')
# ka
Detect Greek text
detect_language(u'Λορεμ ιψθμ δολορ σιτ αμετ')
# el
Detect Russian (Cyrillic) text
detect_language(u'Лорем ипсум долор сит амет')
# ru
Slugify¶
Required imports
from transliterate import slugify
Slugify Armenian text
slugify(u'Լօրեմ իպսում դօլօր սիտ ամետ')
# lorem-ipsum-dolor-sit-amet
Slugify Georgian text
slugify(u'ლორემ იპსუმ დოლორ სით ამეთ')
# lorem-ipsum-dolor-sit-amet
Slugify Greek text
slugify(u'Λορεμ ιψθμ δολορ σιτ αμετ')
# lorem-ipsum-dolor-sit-amet
Slugify Russian (Cyrillic) text
slugify(u'Лорем ипсум долор сит амет')
# lorem-ipsum-dolor-sit-amet
Missing a language pack?¶
Missing a language pack for your own language? Contribute to the project by making one and it will appear in a new version (which will be released very quickly).
Writing documentation¶
Keep the following hierarchy.
=====
title
=====
header
======
sub-header
----------
sub-sub-header
~~~~~~~~~~~~~~
sub-sub-sub-header
^^^^^^^^^^^^^^^^^^
sub-sub-sub-sub-header
++++++++++++++++++++++
sub-sub-sub-sub-sub-header
**************************
License¶
GPL-2.0-only OR LGPL-2.1-or-later
Author¶
Artur Barseghyan <artur.barseghyan@gmail.com>
Documentation¶
Contents:
transliterate package¶
Subpackages¶
transliterate.contrib package¶
Subpackages¶
-
class
transliterate.contrib.languages.bg.translit_language_pack.
BulgarianLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Bulgarian language.
See http://en.wikipedia.org/wiki/Romanization_of_Bulgarian for details.
-
character_ranges
= ((1024, 1279), (1280, 1327))¶
-
detectable
= False¶
-
language_code
= 'bg'¶
-
language_name
= 'Bulgarian'¶
-
mapping
= (u'abvgdeziyklmnoprstufhABVGDEZIYKLMNOPRSTUFH', u'\u0430\u0431\u0432\u0433\u0434\u0435\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0410\u0411\u0412\u0413\u0414\u0415\u0417\u0418\u0419\u041a\u041b\u041c\u041d\u041e\u041f\u0420\u0421\u0422\u0423\u0424\u0425')¶
-
pre_processor_mapping
= {u'Ch': u'\u0427', u'Q': u'\u042f', u'Sh': u'\u0428', u'Sht': u'\u0429', u'Ts': u'\u0426', u'Ya': u'\u042f', u'Yu': u'\u042e', u'Zh': u'\u0416', u'ch': u'\u0447', u'q': u'\u042f', u'sh': u'\u0448', u'sht': u'\u0449', u'ts': u'\u0446', u'ya': u'\u044f', u'yu': u'\u044e', u'zh': u'\u0436'}¶
-
reversed_specific_mapping
= (u'\u044c\u044a\u042a', u'yaA')¶
-
-
class
transliterate.contrib.languages.el.translit_language_pack.
GreekLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Greek language.
See http://en.wikipedia.org/wiki/Greek_alphabet and https://en.wikipedia.org/wiki/Romanization_of_Greek#Modern_Greek for details.
-
character_ranges
= ((880, 1023), (7936, 8191))¶
-
detectable
= True¶
-
language_code
= 'el'¶
-
language_name
= 'Greek'¶
-
mapping
= (u'avgdeziklmnxoprstyfAVGDEZIKLMNXOPRSTYF', u'\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u0391\u0392\u0393\u0394\u0395\u0396\u0399\u039a\u039b\u039c\u039d\u039e\u039f\u03a0\u03a1\u03a3\u03a4\u03a5\u03a6')¶
-
pre_processor_mapping
= {u'Au': u'\u0391\u03c5', u'B': u'\u039c\u03c0', u'Ch': u'\u03a7', u'Ef': u'\u0395\u03c5', u'Eu': u'\u0395\u03c5', u'Ev': u'\u0395\u03c5', u'Ey': u'\u0395\u03c5', u'Gk': u'\u0393\u03ba', u'If': u'\u0397\u03c5', u'Iv': u'\u0397\u03c5', u'Iy': u'\u0397\u03c5', u'Oi': u'\u039f\u03b9', u'Ou': u'\u039f\u03c5', u'Oy': u'\u039f\u03c5', u'Ps': u'\u03a8', u'Th': u'\u0398', u'U': u'\u03a5', u'Yi': u'\u03a5\u03b9', u'au': u'\u03b1\u03c5', u'b': u'\u03bc\u03c0', u'ch': u'\u03c7', u'ef': u'\u03b5\u03c5', u'eu': u'\u03b5\u03c5', u'ev': u'\u03b5\u03c5', u'ey': u'\u03b5\u03c5', u'gch': u'\u03b3\u03be', u'gk': u'\u03b3\u03ba', u'gx': u'\u03b3\u03be', u'if': u'\u03b7\u03c5', u'iv': u'\u03b7\u03c5', u'iy': u'\u03b7\u03c5', u'nch': u'\u03b3\u03be', u'ng': u'\u03b3\u03b3', u'nx': u'\u03b3\u03be', u'oi': u'\u03bf\u03b9', u'ou': u'\u03bf\u03c5', u'oy': u'\u03bf\u03c5', u'ps': u'\u03c8', u'th': u'\u03b8', u'u': u'\u03c5', u'yi': u'\u03c5\u03b9'}¶
-
reversed_specific_mapping
= (u'\u03c2\u03ac\u03ad\u03ae\u03af\u03cd\u03cc\u03ce\u03ca\u03cb\u0390\u03b0\u03c9\u03a9\u03b7\u0397\u03f5\u03f1', u'saeiiyooiyiyooiier')¶
-
-
class
transliterate.contrib.languages.he.translit_language_pack.
HebrewLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Hebrew language.
See http://en.wikipedia.org/wiki/Hebrew_alphabet for details. See the http://en.wikipedia.org/wiki/Romanization_of_Hebrew#When_to_transliterate for transliteration details. Note, that this language pack implements the new standards (2006) of Hebrew Academy.
- Confirmed
- a אּ v ב b בּּּ ּg ג gg ג ּd ד dd דּ h ה h הּ v ו vv וּ z ז zz זּ
-
character_ranges
= ((1328, 1423), (64272, 64287))¶
-
detectable
= True¶
-
language_code
= 'he'¶
-
language_name
= 'Hebrew'¶
-
mapping
= (u'abgdvzhilmnsfckrt', u'\u05d0\u05d1\u05d2\u05d3\u05d5\u05d6\u05d7\u05d9\u05dc\u05de\u05e0\u05e1\u05e4\u05e6\u05e7\u05e8\u05ea')¶
-
pre_processor_mapping
= {u'aa': u'\u05e2', u'cs': u'\u05e5', u'fs': u'\u05e3', u"ha'": u'\u05d4', u'ka': u'\u05db', u'ks': u'\u05da', u'ms': u'\u05dd', u'ns': u'\u05df', u'sh': u'\u05e9', u'tt': u'\u05d8'}¶
-
reversed_specific_mapping
= (u'\u05e4', u'p')¶
-
class
transliterate.contrib.languages.hi.translit_language_pack.
HindiLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Hindi language.
See http://en.wikipedia.org/wiki/Hindi for details.
-
character_ranges
= ((2304, 2431),)¶
-
detectable
= True¶
-
language_code
= 'hi'¶
-
language_name
= 'Hindi'¶
-
mapping
= (u'aeof', u'\u0905\u0907\u0913\u092b')¶
-
pre_processor_mapping
= {u'b': u'\u092c\u0940', u'c': u'\u0938\u0940', u'd': u'\u0921\u0940', u'g': u'\u091c\u0940', u'h': u'\u090f\u091a', u'i': u'\u0906\u0908', u'j': u'\u091c\u0947', u'k': u'\u0915\u0947', u'l': u'\u0905\u0932', u'm': u'\u090d\u092e', u'n': u'\u0905\u0928', u'p': u'\u092a\u0940', u'q': u'\u0915\u094d\u092f\u0942', u'r': u'\u0906\u0930', u's': u'\u090f\u0938', u't': u'\u091f\u0940', u'u': u'\u092f\u0942', u'w': u'\u0921\u092c\u094d\u0932\u0942', u'x': u'\u0905\u0915\u094d\u0938', u'y': u'\u0935\u093e\u092f', u'z': u'\u091c\u095c'}¶
-
-
class
transliterate.contrib.languages.hy.translit_language_pack.
ArmenianLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Armenian language.
See https://en.wikipedia.org/wiki/Armenian_alphabet for details.
-
character_ranges
= ((1328, 1423), (64272, 64287))¶
-
detectable
= True¶
-
language_code
= 'hy'¶
-
language_name
= 'Armenian'¶
-
mapping
= (u'abgdezilxkhmjnpsvtrcq&ofABGDEZILXKHMJNPSVTRCQOF', u'\u0561\u0562\u0563\u0564\u0565\u0566\u056b\u056c\u056d\u056f\u0570\u0574\u0575\u0576\u057a\u057d\u057e\u057f\u0580\u0581\u0584\u0587\u0585\u0586\u0531\u0532\u0533\u0534\u0535\u0536\u053b\u053c\u053d\u053f\u0540\u0544\u0545\u0546\u054a\u054d\u054e\u054f\u0550\u0551\u0554\u0555\u0556')¶
-
pre_processor_mapping
= {u'Ch': u'\u0549', u'Dj': u'\u054b', u'Dz': u'\u0541', u"E'": u'\u0537', u'Gh': u'\u0542', u'Jh': u'\u053a', u'Ph': u'\u0553', u'Sh': u'\u0547', u'Tch': u'\u0543', u'Th': u'\u0539', u'Ts': u'\u053e', u'U': u'\u0548\u0582', u'Vo': u'\u0548', u'Y': u'\u0538', u'ch': u'\u0579', u'dj': u'\u057b', u'dz': u'\u0571', u"e'": u'\u0567', u'gh': u'\u0572', u'jh': u'\u056a', u'ph': u'\u0583', u'sh': u'\u0577', u'tch': u'\u0573', u'th': u'\u0569', u'ts': u'\u056e', u'u': u'\u0578\u0582', u'vo': u'\u0578', u'y': u'\u0568'}¶
-
reversed_specific_mapping
= (u'\u057c\u054c', u'rR')¶
-
reversed_specific_pre_processor_mapping
= {u'\u0548\u0582': u'U', u'\u0578\u0582': u'u'}¶
-
-
class
transliterate.contrib.languages.ka.translit_language_pack.
GeorgianLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Georgian language.
See `http://en.wikipedia.org/wiki/Georgian_alphabet for details.
-
character_ranges
= ((4256, 4293), (4304, 4348), (11520, 11557))¶
-
detectable
= True¶
-
language_code
= 'ka'¶
-
language_name
= 'Georgian'¶
-
mapping
= (u'ABGDEVZTIKLMNOPJRSTUFQYCXHabgdevztiklmnoprsufqycxjhw', u'\u10d0\u10d1\u10d2\u10d3\u10d4\u10d5\u10d6\u10d7\u10d8\u10d9\u10da\u10db\u10dc\u10dd\u10de\u10df\u10e0\u10e1\u10e2\u10e3\u10e4\u10e5\u10e7\u10ea\u10ee\u10f0\u10d0\u10d1\u10d2\u10d3\u10d4\u10d5\u10d6\u10d7\u10d8\u10d9\u10da\u10db\u10dc\u10dd\u10de\u10e0\u10e1\u10e3\u10e4\u10e5\u10e7\u10ea\u10ee\u10ef\u10f0\u10ec')¶
-
pre_processor_mapping
= {u'ch': u'\u10e9', u"ch'": u'\u10ed', u'dz': u'\u10eb', u'gh': u'\u10e6', u'kh': u'\u10ee', u'sh': u'\u10e8', u'ts': u'\u10ec', u'zh': u'\u10df'}¶
-
-
class
transliterate.contrib.languages.l1.translit_language_pack.
Latin1SupplementLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Latin1 Supplement.
Though not exactly a language, it’s a set of commonly found unicode characters. See http://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29 for details.
-
character_ranges
= ((192, 214), (216, 246), (248, 255))¶
-
detectable
= True¶
-
language_code
= 'l1'¶
-
language_name
= 'Latin1 Supplement'¶
-
mapping
= (u'abcdefghijklmnopqrstuvwxyzABCDEFGHILJKMNOPQRSTUVWXYZ', u'abcdefghijklmnopqrstuvwxyzABCDEFGHILJKMNOPQRSTUVWXYZ')¶
-
reversed_specific_mapping
= (u'\xe0\xc0\xe1\xc1\xe2\xc2\xe3\xc3\xe8\xc8\xe9\xc9\xea\xca\xeb\xcb\xec\xcc\xed\xcd\xee\xce\xef\xcf\xf0\xd0\xf1\xd1\xf2\xd2\xf3\xd3\xf4\xd4\xf5\xd5\xf9\xd9\xfa\xda\xfb\xdb\xfd\xdd\xff\u0178', u'aAaAaAaAeEeEeEeEiIiIiIiIdDnNoOoOoOaOuUuUuUyYyY')¶
-
reversed_specific_pre_processor_mapping
= {u'\xc4': u'Ae', u'\xc5': u'Aa', u'\xc6': u'Ae', u'\xc7': u'Ts', u'\xd6': u'Oe', u'\xd8': u'Oe', u'\xdc': u'Ue', u'\xde': u'Th', u'\xdf': u'ss', u'\xe4': u'ae', u'\xe5': u'aa', u'\xe6': u'ae', u'\xe7': u'ts', u'\xf0': u'dh', u'\xf6': u'oe', u'\xf8': u'oe', u'\xfc': u'ue', u'\xfe': u'th'}¶
-
-
class
transliterate.contrib.languages.mk.translit_language_pack.
MacedonianLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Macedonian language.
See http://en.wikipedia.org/wiki/Romanization_of_Macedonian for details.
-
character_ranges
= ((1024, 1279), (1280, 1327))¶
-
detectable
= False¶
-
language_code
= 'mk'¶
-
language_name
= 'Macedonian'¶
-
mapping
= (u'abvgdezijklmnoprstufhcABVGDEZIJKLMNOPRSTUFHC', u'\u0430\u0431\u0432\u0433\u0434\u0435\u0437\u0438\u0458\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0410\u0411\u0412\u0413\u0414\u0415\u0417\u0418\u0408\u041a\u041b\u041c\u041d\u041e\u041f\u0420\u0421\u0422\u0423\u0424\u0425\u0426')¶
-
pre_processor_mapping
= {u'Ch': u'\u0427', u'Dz': u'\u0405', u'Dzh': u'\u040f', u'Gj': u'\u0403', u'Kj': u'\u040c', u'Lj': u'\u0409', u'Nj': u'\u040a', u'Sh': u'\u0428', u'Zh': u'\u0416', u'ch': u'\u0447', u'dz': u'\u0455', u'dzh': u'\u045f', u'gj': u'\u0453', u'lj': u'\u0459', u'nj': u'\u045a', u'sh': u'\u0448', u'zh': u'\u0436', u'\u043a\u0458': u'\u045c'}¶
-
reversed_specific_mapping
= (u'', u'')¶
-
-
class
transliterate.contrib.languages.mn.translit_language_pack.
MongolianLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Mongolian language.
See https://en.wikipedia.org/wiki/Mongolian_Cyrillic_alphabet for details.
-
character_ranges
= ((1024, 1279), (1280, 1327))¶
-
detectable
= False¶
-
language_code
= 'mn'¶
-
language_name
= 'Mongolian'¶
-
mapping
= (u'abvgdjziklmnoprstuufhewABVGDJZIKLMNOPRSTUUFHEW', u'\u0430\u0431\u0432\u0433\u0434\u0436\u0437\u0438\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u04af\u0444\u0445\u044d\u0432\u0410\u0411\u0412\u0413\u0414\u0416\u0417\u0418\u041a\u041b\u041c\u041d\u041e\u041f\u0420\u0421\u0422\u0423\u04ae\u0424\u0425\u042d\u0412')¶
-
pre_processor_mapping
= {u'AI': u'\u0410\u0419', u'Ai': u'\u0410\u0439', u'CH': u'\u0427', u'EI': u'\u042d\u0419', u'Ei': u'\u042d\u0439', u'II': u'\u0418\u0419', u'Ii': u'\u0418\u0439', u'KH': u'\u0425', u'OI': u'\u041e\u0419', u'Oi': u'\u041e\u0439', u'SH': u'\u0428', u'TS': u'\u0426', u'UI': u'\u0423\u0419', u'Ui': u'\u0423\u0439', u'YA': u'\u042f', u'YE': u'\u0415', u'YO': u'\u0401', u'YU': u'\u042e', u'Yo': u'\u0401', u'Yu': u'\u042e\u0443', u'ai': u'\u0430\u0439', u'ch': u'\u0447', u'ei': u'\u044d\u0439', u'ii': u'\u0438\u0439', u'kh': u'\u0445', u'oi': u'\u043e\u0439', u'sh': u'\u0448', u'ts': u'\u0446', u'ui': u'\u0443\u0439', u'ya': u'\u044f', u'ye': u'\u0435', u'yo': u'\u0451', u'yu': u'\u044e'}¶
-
reversed_specific_mapping
= (u'\u044a\u044c\u042a\u042c\u0439\u0419\u04e9\u04e8\u0443\u0423\u04af\u04ae', u'iiIIiIoOuUuU')¶
-
-
class
transliterate.contrib.languages.ru.translit_language_pack.
RussianLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Russian language.
See http://en.wikipedia.org/wiki/Russian_alphabet for details.
-
character_ranges
= ((1024, 1279), (1280, 1327))¶
-
detectable
= True¶
-
language_code
= 'ru'¶
-
language_name
= 'Russian'¶
-
mapping
= (u"abvgdezijklmnoprstufhcC'y'ABVGDEZIJKLMNOPRSTUFH'Y'", u'\u0430\u0431\u0432\u0433\u0434\u0435\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0426\u044a\u044b\u044c\u0410\u0411\u0412\u0413\u0414\u0415\u0417\u0418\u0419\u041a\u041b\u041c\u041d\u041e\u041f\u0420\u0421\u0422\u0423\u0424\u0425\u042a\u042b\u042c')¶
-
pre_processor_mapping
= {u'Ch': u'\u0427', u'Ja': u'\u042f', u'Ju': u'\u042e', u'Sch': u'\u0429', u'Sh': u'\u0428', u'Ts': u'\u0426', u'Zh': u'\u0416', u'ch': u'\u0447', u'ja': u'\u044f', u'ju': u'\u044e', u'sch': u'\u0449', u'sh': u'\u0448', u'ts': u'\u0446', u'zh': u'\u0436'}¶
-
reversed_specific_mapping
= (u'\u0451\u044d\u0401\u042d\u044a\u044c\u042a\u042c', u"eeEE''''")¶
-
-
class
transliterate.contrib.languages.sr.translit_language_pack.
SerbianLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Serbian language.
See https://en.wikipedia.org/wiki/Romanization_of_Serbian for details.
-
character_ranges
= ((1032, 1264), (0, 383))¶
-
detectable
= False¶
-
language_code
= 'sr'¶
-
language_name
= 'Serbian'¶
-
mapping
= (u'abvgd\u0111e\u017ezijklmnoprst\u0107ufhc\u010d\u0161ABVGD\u0110E\u017dZIJKLMNOPRST\u0106UFHC\u010c\u0160', u'\u0430\u0431\u0432\u0433\u0434\u0452\u0435\u0436\u0437\u0438\u0458\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u045b\u0443\u0444\u0445\u0446\u0447\u0448\u0410\u0411\u0412\u0413\u0414\u0402\u0415\u0416\u0417\u0418\u0408\u041a\u041b\u041c\u041d\u041e\u041f\u0420\u0421\u0422\u040b\u0423\u0424\u0425\u0426\u0427\u0428')¶
-
pre_processor_mapping
= {u'D\u017e': u'\u040f', u'Lj': u'\u0409', u'Nj': u'\u040a', u'd\u017e': u'\u045f', u'lj': u'\u0459', u'nj': u'\u045a'}¶
-
reversed_specific_mapping
= (u'',)¶
-
-
class
transliterate.contrib.languages.uk.translit_language_pack.
UkrainianLanguagePack
[source]¶ Bases:
transliterate.base.TranslitLanguagePack
Language pack for Ukrainian language.
See http://en.wikipedia.org/wiki/Ukrainian_alphabet for details.
-
character_ranges
= ((1024, 1279), (1280, 1327))¶
-
language_code
= 'uk'¶
-
language_name
= 'Ukrainian'¶
-
mapping
= (u"abvhgdezyijklmnoprstuf'ABVHGDEZYIJKLMNOPRSTUF'", u'\u0430\u0431\u0432\u0433\u0491\u0434\u0435\u0437\u0438\u0456\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u044c\u0410\u0411\u0412\u0413\u0490\u0414\u0415\u0417\u0418\u0406\u0419\u041a\u041b\u041c\u041d\u041e\u041f\u0420\u0421\u0422\u0423\u0424\u042c')¶
-
pre_processor_mapping
= {u'Ch': u'\u0427', u'Ja': u'\u042f', u'Ju': u'\u042e', u'Kh': u'\u0425', u'Sh': u'\u0428', u'Shch': u'\u0429', u'Ts': u'\u0426', u'Ye': u'\u0404', u'Yi': u'\u0407', u'Zh': u'\u0416', u'ch': u'\u0447', u'ja': u'\u044f', u'ju': u'\u044e', u'kh': u'\u0445', u'sh': u'\u0448', u'shch': u'\u0449', u'ts': u'\u0446', u'ye': u'\u0454', u'yi': u'\u0457', u'zh': u'\u0436'}¶
-
reversed_specific_mapping
= (u'\u044c\u042c', u"''")¶
-
Module contents¶
transliterate.tests package¶
Subpackages¶
Submodules¶
transliterate.tests.base module¶
transliterate.tests.defaults module¶
transliterate.tests.helpers module¶
transliterate.tests.test_transliterate module¶
-
class
transliterate.tests.test_transliterate.
TransliterateTest
(methodName='runTest')[source]¶ Bases:
unittest.case.TestCase
Test
transliterate.utils.translit
.-
test_01_get_available_language_codes
(*args, **kwargs)¶
-
test_02_translit_latin_to_armenian
(*args, **kwargs)¶
-
test_03_translit_latin_to_georgian
(*args, **kwargs)¶
-
test_04_translit_latin_to_greek
(*args, **kwargs)¶
-
test_06_translit_latin_to_bulgarian_cyrillic
(*args, **kwargs)¶
-
test_06_translit_latin_to_cyrillic
(*args, **kwargs)¶
-
test_06_translit_latin_to_mongolian_cyrillic
(*args, **kwargs)¶
-
test_06_translit_latin_to_serbian_cyrillic
(*args, **kwargs)¶
-
test_06_translit_latin_to_ukrainian_cyrillic
(*args, **kwargs)¶
-
test_07_translit_armenian_to_latin
(*args, **kwargs)¶
-
test_08_translit_georgian_to_latin
(*args, **kwargs)¶
-
test_09_translit_greek_to_latin
(*args, **kwargs)¶
-
test_11_translit_bulgarian_cyrillic_to_latin
(*args, **kwargs)¶
-
test_11_translit_cyrillic_to_latin
(*args, **kwargs)¶
-
test_11_translit_mongolian_cyrillic_to_latin
(*args, **kwargs)¶
-
test_11_translit_serbian_cyrillic_to_latin
(*args, **kwargs)¶
-
test_11_translit_ukrainian_cyrillic_to_latin
(*args, **kwargs)¶
-
test_12_function_decorator
(*args, **kwargs)¶
-
test_13_method_decorator
(*args, **kwargs)¶
-
test_14_function_decorator
(*args, **kwargs)¶
-
test_15_register_custom_language_pack
(*args, **kwargs)¶
-
test_16_translipsum_generator_armenian
(*args, **kwargs)¶
-
test_17_translipsum_generator_georgian
(*args, **kwargs)¶
-
test_18_translipsum_generator_greek
(*args, **kwargs)¶
-
test_20_translipsum_generator_bulgarian_cyrillic
(*args, **kwargs)¶
-
test_20_translipsum_generator_cyrillic
(*args, **kwargs)¶
-
test_20_translipsum_generator_mongolian_cyrillic
(*args, **kwargs)¶
-
test_20_translipsum_generator_serbian_cyrillic
(*args, **kwargs)¶
-
test_20_translipsum_generator_ukrainian_cyrillic
(*args, **kwargs)¶
-
test_21_language_detection_armenian
(*args, **kwargs)¶
-
test_22_language_detection_georgian
(*args, **kwargs)¶
-
test_23_language_detection_greek
(*args, **kwargs)¶
-
test_25_false_language_detection_cyrillic
(*args, **kwargs)¶
-
test_25_language_detection_cyrillic
(*args, **kwargs)¶
-
test_26_slugify_armenian
(*args, **kwargs)¶
-
test_27_slugify_georgian
(*args, **kwargs)¶
-
test_28_slugify_greek
(*args, **kwargs)¶
-
test_30_slugify_bulgarian_cyrillic
(*args, **kwargs)¶
-
test_30_slugify_cyrillic
(*args, **kwargs)¶
-
test_30_slugify_mongolian_cyrillic
(*args, **kwargs)¶
-
test_30_slugify_serbian_cyrillic
(*args, **kwargs)¶
-
test_30_slugify_ukrainian_cyrillic
(*args, **kwargs)¶
-
test_31_override_settings
(*args, **kwargs)¶
-
test_31b_get_translit_function
(*args, **kwargs)¶
-
test_32_auto_translit_reversed
(*args, **kwargs)¶
-
test_33_register_unregister
(*args, **kwargs)¶
-
test_35_translit_serbian_cyrillic_to_serbian_latin
(*args, **kwargs)¶
-
test_35_translit_serbian_latin_to_serbian_cyrillic
(*args, **kwargs)¶
-
Module contents¶
Submodules¶
transliterate.base module¶
-
class
transliterate.base.
TranslitLanguagePack
[source]¶ Bases:
object
Base language pack.
The attributes below shall be defined in every language pack.
language_code
: Language code (obligatory). Example value: ‘hy’, ‘ru’.language_name
: Language name (obligatory). Example value: ‘Armenian’,‘Russian’.character_ranges
: Character ranges that are specific to the language.- When making a pack, check this page for the ranges.
mapping
: Mapping (obligatory). A tuple, consisting of two strings- (source and target). Example value: (u’abc’, u’աբց’).
reversed_specific_mapping
: Specific mapping (one direction only) used- when transliterating from target script to source script (reversed transliteration).
- ՝՝pre_processor_mapping՝՝: Pre processor mapping (optional). A dictionary
- mapping for letters that can’t be represented by a single latin letter.
- ՝՝reversed_specific_pre_processor_mapping՝՝: Pre processor mapping (
- optional). A dictionary mapping for letters that can’t be represented by a single latin letter (reversed transliteration).
example: >>> class ArmenianLanguagePack(TranslitLanguagePack): >>> language_code = "hy" >>> language_name = "Armenian" >>> character_ranges = ((0x0530, 0x058F), (0xFB10, 0xFB1F)) >>> mapping = ( >>> u"abgdezilxkhmjnpsvtrcq&ofABGDEZILXKHMJNPSVTRCQOF", # Source script >>> u"աբգդեզիլխկհմյնպսվտրցքևօֆԱԲԳԴԵԶԻԼԽԿՀՄՅՆՊՍՎՏՐՑՔՕՖ", # Target script >>> ) >>> reversed_specific_mapping = ( >>> u"ռՌ", >>> u"rR" >>> ) >>> pre_processor_mapping = { >>> # lowercase >>> u"e'": u"է", >>> u"y": u"ը", >>> u"th": u"թ", >>> u"jh": u"ժ", >>> u"ts": u"ծ", >>> u"dz": u"ձ", >>> u"gh": u"ղ", >>> u"tch": u"ճ", >>> u"sh": u"շ", >>> u"vo": u"ո", >>> u"ch": u"չ", >>> u"dj": u"ջ", >>> u"ph": u"փ", >>> u"u": u"ու", >>> >>> # uppercase >>> u"E'": u"Է", >>> u"Y": u"Ը", >>> u"Th": u"Թ", >>> u"Jh": u"Ժ", >>> u"Ts": u"Ծ", >>> u"Dz": u"Ձ", >>> u"Gh": u"Ղ", >>> u"Tch": u"Ճ", >>> u"Sh": u"Շ", >>> u"Vo": u"Ո", >>> u"Ch": u"Չ", >>> u"Dj": u"Ջ", >>> u"Ph": u"Փ", >>> u"U": u"Ու" >>> } >>> reversed_specific_pre_processor_mapping = { >>> u"ու": u"u", >>> u"Ու": u"U" >>> } Note, that in Python 3 you won't be using u prefix before the strings.
-
character_ranges
= None¶
-
characters
= None¶
-
classmethod
contains
(character)[source]¶ Check if given character belongs to the language pack.
Return bool:
-
classmethod
detect
(num_words=None)[source]¶ Detect the language.
Heavy language detection, which is activated for languages that are harder detect (like Russian Cyrillic and Ukrainian Cyrillic).
Parameters: - value (unicode) – Input string.
- num_words (int) – Number of words to base decision on.
Return bool: True if detected and False otherwise.
-
detectable
= False¶
-
language_code
= None¶
-
language_name
= None¶
-
make_strict
(value, reversed=False)[source]¶ Strip out unnecessary characters from the string.
Parameters: - value (string) –
- reversed (bool) –
Return string:
-
mapping
= None¶
-
pre_processor_mapping
= None¶
-
pre_processor_mapping_keys
= []¶
-
reversed_characters
= None¶
-
reversed_pre_processor_mapping
= None¶
-
reversed_pre_processor_mapping_keys
= []¶
-
reversed_specific_mapping
= None¶
-
reversed_specific_pre_processor_mapping
= None¶
-
reversed_specific_pre_processor_mapping_keys
= []¶
transliterate.conf module¶
transliterate.decorators module¶
-
transliterate.decorators.
transliterate_function
¶ alias of
transliterate.decorators.TransliterateFunction
-
transliterate.decorators.
transliterate_method
¶ alias of
transliterate.decorators.TransliterateMethod
transliterate.defaults module¶
transliterate.discover module¶
transliterate.exceptions module¶
-
exception
transliterate.exceptions.
ImproperlyConfigured
[source]¶ Bases:
exceptions.Exception
Exception raised when developer didn’t configure the code properly.
-
exception
transliterate.exceptions.
InvalidRegistryItemType
[source]¶ Bases:
exceptions.ValueError
Raised when an attempt is made to register an item in the registry.
Raised when an attempt is made to register an item in the registry which does not have a proper type.
-
exception
transliterate.exceptions.
LanguageCodeError
[source]¶ Bases:
exceptions.Exception
Exception raised when language code is empty or has incorrect value.
transliterate.utils module¶
-
transliterate.utils.
detect_language
(text, num_words=None, fail_silently=True, heavy_check=False)[source]¶ Detect the language from the value given.
Detect the language from the value given based on ranges defined in active language packs.
Parameters: - value (unicode) – Input string.
- num_words (int) – Number of words to base decision on.
- fail_silently (bool) –
- heavy_check (bool) – If given, heavy checks would be applied when
simple checks don’t give any results. Heavy checks are language
specific and do not apply to a common logic. Heavy language detection
is defined in the
detect
method of each language pack.
Return str: Language code.
-
transliterate.utils.
get_available_language_codes
()[source]¶ Get list of language codes for registered language packs.
Return list:
-
transliterate.utils.
get_available_language_packs
()[source]¶ Get list of registered language packs.
Return list:
-
transliterate.utils.
get_translit_function
(language_code)[source]¶ Return translit function for the language given.
Parameters: language_code (str) – Return callable:
-
transliterate.utils.
slugify
(text, language_code=None)[source]¶ Slugify the given text.
If no
language_code
is given, auto-detect the language code from text given.Parameters: - text (str) –
- language_code (str) –
Return str:
-
transliterate.utils.
suggest
(value, language_code=None, reversed=False, limit=None)[source]¶ Suggest possible variants.
Parameters: - value (str) –
- language_code (str) –
- reversed (bool) – If set to True, reversed translation is made.
- limit (int) – Limit number of suggested variants.
Return list:
-
transliterate.utils.
translit
(value, language_code=None, reversed=False, strict=False)[source]¶ Transliterate the text for the language given.
Language code is optional in case of reversed translations (from some script to latin).
Parameters: - value (str) –
- language_code (str) –
- reversed (bool) – If set to True, reversed translation is made.
- strict (bool) – If given, all that are not found in the transliteration pack, are simply stripped out.
Return str:
Module contents¶
-
transliterate.
detect_language
(text, num_words=None, fail_silently=True, heavy_check=False)[source]¶ Detect the language from the value given.
Detect the language from the value given based on ranges defined in active language packs.
Parameters: - value (unicode) – Input string.
- num_words (int) – Number of words to base decision on.
- fail_silently (bool) –
- heavy_check (bool) – If given, heavy checks would be applied when
simple checks don’t give any results. Heavy checks are language
specific and do not apply to a common logic. Heavy language detection
is defined in the
detect
method of each language pack.
Return str: Language code.
-
transliterate.
get_available_language_codes
()[source]¶ Get list of language codes for registered language packs.
Return list:
-
transliterate.
get_available_language_packs
()[source]¶ Get list of registered language packs.
Return list:
-
transliterate.
get_translit_function
(language_code)[source]¶ Return translit function for the language given.
Parameters: language_code (str) – Return callable:
-
transliterate.
slugify
(text, language_code=None)[source]¶ Slugify the given text.
If no
language_code
is given, auto-detect the language code from text given.Parameters: - text (str) –
- language_code (str) –
Return str:
-
transliterate.
translit
(value, language_code=None, reversed=False, strict=False)[source]¶ Transliterate the text for the language given.
Language code is optional in case of reversed translations (from some script to latin).
Parameters: - value (str) –
- language_code (str) –
- reversed (bool) – If set to True, reversed translation is made.
- strict (bool) – If given, all that are not found in the transliteration pack, are simply stripped out.
Return str:
Release history¶
1.10.2¶
2018-09-17
- Add
get_translit_function
for speed-ups when looping through a large set of strings.
1.10.1¶
2018-05-02
- Fixes and improvements in Georgian language pack. Removed historical asomtavruli (an old Georgian script which is no longer used) support.
- Improvements in Serbian language pack.
1.9¶
2016-12-27
- Dropping Python 3.2 and Python 3.3 support.
- Clean up.
- pep8 fixes.
- Minor fixes in Greek language pack.
- Dedicated shell in example project.
- Tested again PyPy.
1.8¶
2016-07-09
- Added Macedonian language pack.
- Added Mongolian language pack.
- Drop support for Python 3.2.
1.6¶
2014-03-12
- Ukrainian language pack added.
- Each language pack got an extra properly
detectable
, which is set to False by default. Language packs with that properly set to False are excluded from language auto-detection. - Improved tests.
1.3¶
2013-10-01
- Fixed reversed translation of some chars in Russian language pack.
- Improved tests.
- Minor API improvements.
1.1¶
2013-09-08
- Allow language packs to be unregistered when not forced.
- Minor documentation improvements.
0.9¶
2013-08-03
- Greek language pack status changed to beta.
- Improvements of slugify and language detection of Greek language.
0.5¶
2013-07-31
- Configurable settings added.
- Minor fixes.
- Better debugging.
- Minor documentation improvements.