Oslo系列之oslo.i18n-下

没有人能替你承受痛苦，也没有人能抢走你的坚强。

上一篇我们介绍了这个包的使用方式和在openstack中应用，这篇文章分析一下原理和源码

简介

i18n是用来进行国际化翻译的。经过调查，现在主要有两种翻译方式。

直接通过gettext方法，显示调用

实例: horizon的国际化:

scope.operationlogi18n = {
  'Create Instance': gettext('Create Instance'),
  'Shutdown Instance': gettext('Shutdown Instance'),
}

分析:
直接调用gettext方法，获取翻译结果

通过_方法使用懒汉加载模式

_方法实际是:

import  oslo_i18n
# ref: https://docs.openstack.org/oslo.i18n/ocata/usage.html
DOMAIN = "myproject"
_translators = oslo_i18n.TranslatorFactory(domain=DOMAIN)
_ = _translators.primary

这种方式具体参见后续源码分析。

i18n的懒加载模式

Lazy Translation¶
Lazy translation delays converting a message string to the translated form as long as possible, including possibly never if the message is not logged or delivered to the user in some other way. It also supports logging translated messages in multiple languages, by configuring separate log handlers.

Lazy translation is implemented by returning a special object from the translation function, instead of a unicode string. That special message object supports some, but not all, string manipulation APIs. For example, concatenation with addition is not supported, but interpolation of variables is supported. Depending on how translated strings are used in an application, these restrictions may mean that lazy translation cannot be used, and so it is not enabled by default.

To enable lazy translation, call enable_lazy().

import oslo_i18n

oslo_i18n.enable_lazy()

以aodh为例，在aodh/service.py中有 oslo_i18n.enable_lazy() 这么一行代码，直接来看源码

源码分析

自己实现一个简单的i18n.py文件：

import  oslo_i18n

# ref: https://docs.openstack.org/oslo.i18n/ocata/usage.html

DOMAIN = "myproject"

_translators = oslo_i18n.TranslatorFactory(domain=DOMAIN)

_ = _translators.primary

_LI = _translators.log_info
_LW = _translators.log_warning
_LE = _translators.log_eror
_LC = _translators.log_critical

def get_available_languages():
    return oslo_i18n.get_available_languages(DOMAIN)


def translate(value, user_locale):
    return oslo_i18n.translate(value, user_locale)

分析:
1）上述先设置了方法
2）假设有需要使用国际化翻译的地方，可以使用如下形式进行国际化
引用`，_LI`等

1 2	from i18n import _, _LI LOG.info(LI('enter app_factory'))

或者这种用法，推荐用_而不是_LI

1 2	result = _('enter app_factory') LOG.info(result)

3） _方法内部调用了_make_translation_func方法，内容如下

def _make_translation_func(self, domain=None):
    """Return a translation function ready for use with messages.

    The returned function takes a single value, the unicode string
    to be translated.  The return type varies depending on whether
    lazy translation is being done. When lazy translation is
    enabled, :class:`Message` objects are returned instead of
    regular :class:`unicode` strings.

    The domain argument can be specified to override the default
    from the factory, but the localedir from the factory is always
    used because we assume the log-level translation catalogs are
    installed in the same directory as the main application
    catalog.

    """
    if domain is None:
        domain = self.domain
    t = gettext.translation(domain,
                            localedir=self.localedir,
                            fallback=True)
    # Use the appropriate method of the translation object based
    # on the python version.
    m = t.gettext if six.PY3 else t.ugettext

    def f(msg):
        """oslo_i18n.gettextutils translation function."""
        if _lazy.USE_LAZY:
            return _message.Message(msg, domain=domain)
        return m(msg)
    return f

class Message(six.text_type):
    """A Message object is a unicode object that can be translated.

    Translation of Message is done explicitly using the translate() method.
    For all non-translation intents and purposes, a Message is simply unicode,
    and can be treated as such.
    """

    def __new__(cls, msgid, msgtext=None, params=None,
                domain='oslo', has_contextual_form=False,
                has_plural_form=False, *args):
        """Create a new Message object.

        In order for translation to work gettext requires a message ID, this
        msgid will be used as the base unicode text. It is also possible
        for the msgid and the base unicode text to be different by passing
        the msgtext parameter.
        """
        # If the base msgtext is not given, we use the default translation
        # of the msgid (which is in English) just in case the system locale is
        # not English, so that the base text will be in that locale by default.
        if not msgtext:
            msgtext = Message._translate_msgid(msgid, domain)
        # We want to initialize the parent unicode with the actual object that
        # would have been plain unicode if 'Message' was not enabled.
        msg = super(Message, cls).__new__(cls, msgtext)
        msg.msgid = msgid
        msg.domain = domain
        msg.params = params
        msg.has_contextual_form = has_contextual_form
        msg.has_plural_form = has_plural_form
        return msg
      
    @staticmethod
    def _translate_msgid(msgid, domain, desired_locale=None,
                         has_contextual_form=False, has_plural_form=False):
        if not desired_locale:
            system_locale = locale.getdefaultlocale() # ('zh_CN', 'UTF-8')
            # If the system locale is not available to the runtime use English
            if not system_locale or not system_locale[0]:
                desired_locale = 'en_US'
            else:
                desired_locale = system_locale[0] # zh_CN

        locale_dir = os.environ.get(
            _locale.get_locale_dir_variable_name(domain) # return domain.upper().replace('.', '_').replace('-', '_') + '_LOCALEDIR'，即AODH_LOCALEDIR
        )
        # <gettext.GNUTranslations instance at 0x7f8ad9e3e200>
        lang = gettext.translation(domain,
                                   localedir=locale_dir,
                                   languages=[desired_locale],
                                   fallback=True)

        if not has_contextual_form and not has_plural_form:
            # This is the most common case, so check it first.
            translator = lang.gettext if six.PY3 else lang.ugettext
            translated_message = translator(msgid) # u'\u8fdb\u5165app\u5de5\u5382'

        elif has_contextual_form and has_plural_form:
            # Reserved for contextual and plural translation function,
            # which is not yet implemented.
            raise ValueError("Unimplemented.")

        elif has_contextual_form:
            (msgctx, msgtxt) = msgid
            translator = lang.gettext if six.PY3 else lang.ugettext

            msg_with_ctx = "%s%s%s" % (msgctx, CONTEXT_SEPARATOR, msgtxt)
            translated_message = translator(msg_with_ctx)

            if CONTEXT_SEPARATOR in translated_message:
                # Translation not found, use the original text
                translated_message = msgtxt

        elif has_plural_form:
            (msgsingle, msgplural, msgcount) = msgid
            translator = lang.ngettext if six.PY3 else lang.ungettext
            translated_message = translator(msgsingle, msgplural, msgcount)

        return translated_message

上面分析了aodh已经开启了懒加载模式，所以这里实例化了一个Message类，调用 _translate_msgid 方法

经过调试发现，所谓的oslo_i18n的懒加载模式实际是获取localedir,获取当前环境变量中的语言，然后找到对应的翻译方法，对msgid进行翻译，获取对应的msgstr，因而能翻译出对应的语言。使用不使用懒加载模式，按照道理应该不影响。

来看下获取环境变量的过程， getdefaultlocale

def getdefaultlocale(envvars=('LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE')):

    try:
        # check if it's supported by the _locale module
        import _locale
        code, encoding = _locale._getdefaultlocale()
    except (ImportError, AttributeError):
        pass
    else:
        # make sure the code/encoding values are valid
        if sys.platform == "win32" and code and code[:2] == "0x":
            # map windows language identifier to language name
            code = windows_locale.get(int(code, 0))
        # ...add other platform-specific processing here, if
        # necessary...
        return code, encoding

    # fall back on POSIX behaviour
    import os
    lookup = os.environ.get
    for variable in envvars:
        localename = lookup(variable,None)
        if localename:
            if variable == 'LANGUAGE':
                localename = localename.split(':')[0]
            break
    else:
        localename = 'C'
    return _parse_localename(localename)

这里整体的逻辑是在寻找语言编码的时候默认是按照这个顺序:(‘LC_ALL’, ‘LC_CTYPE’, ‘LANG’, ‘LANGUAGE’)去查找对应环境变量，一旦找到环境变量，就对该环境变量解析，

然后调用下面的方法:_parse_localename

def _parse_localename(localename):

    """ Parses the locale code for localename and returns the
        result as tuple (language code, encoding).

        The localename is normalized and passed through the locale
        alias engine. A ValueError is raised in case the locale name
        cannot be parsed.

        The language code corresponds to RFC 1766.  code and encoding
        can be None in case the values cannot be determined or are
        unknown to this implementation.

    """
    code = normalize(localename)
    if '@' in code:
        # Deal with locale modifiers
        code, modifier = code.split('@', 1)
        if modifier == 'euro' and '.' not in code:
            # Assume Latin-9 for @euro locales. This is bogus,
            # since some systems may use other encodings for these
            # locales. Also, we ignore other modifiers.
            return code, 'iso-8859-15'

    if '.' in code:
        return tuple(code.split('.')[:2]) # 按点号分割得到: ('zh_CN', 'UTF-8')
    elif code == 'C':
        return None, None
    raise ValueError, 'unknown locale: %s' % localename

按点号分割得到: (‘zh_CN’, ‘UTF-8’)

总结

i18n的懒加载翻译模式中通过使用oslo_i18n.TranslatorFactory(domain=DOMAIN).primary方法进行翻译，里面调用=_make_translation_func方法返回i18n自己定义Message对象，该对象包含了待翻译的信息，最后调用_translate_msgid获取msgid对应的国际化翻译内容msgstr。其中需要设置两样东西:一个是项目本身的localedir目录，为的是找到对应的mo,po等文件信息；一个是设置语言环境变量，优先级从高到低如下: 'LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE'。设置LC_ALL即可不用设置其他环境变量。另外注意: 设置不同语言环境变量，需要重启组件的服务才可以生效。

番外：.po和.mo文件

介绍：PO 是 Portable Object (可移植对象)的缩写形式；MO 是 Machine Object (机器对象) 的缩写形式。PO 文件是面向翻译人员的、提取于源代码的一种资源文件。当软件升级的时候，通过使用 gettext 软件包处理 PO 文件，可以在一定程度上使翻译成果得以继承，减轻翻译人员的负担。MO 文件是面向计算机的、由 PO 文件通过 gettext 软件包编译而成的二进制文件。程序通过读取 MO 文件使自身的界面转换成用户使用的语言。

po文件和mo文件通过msgfmt工具和pygettext转化。

创建po文件:在Python安装目录下的 ./Tools/i18n/ 中找到pygettext.py运行之,生成翻译文件模版messages.pot。内容如下：

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2018-03-13 11:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=cp936\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"

将charset改为charset=UTF-8，其余的可以不用改动。其中的msgid为键值，对应你程序里写的文本，如：_(“New File”)，而msgstr为翻译后的值。添加翻译语句：

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2018-03-13 11:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
msgid " Hello world!"
msgstr "世界你好!"
 
 
msgid " Python is a good Language."
msgstr "Python 是门好语言."

保存该文件，并重命名为messages.po

创建mo文件：在Python安装目录下的 ./Tools/i18n/ 中找到msgfmt.py，在Python模式下，注意messages.po存在的路径：

1	python msgfmt.py messages.po

也可以使用msgfmt命令：

1	msgfmt /aodh/aodh/locale/zh_CN/LC_MESSAGES/messages.mo -o aodh/aodh/locale/zh_CN/LC_MESSAGES/messages.mo

将生成一个messages.mo文件。

建立翻译文件路径：在src目录下创建/locale/zh_CN/LC_MESSAGES/，将messages.po和messages.mo文件拷贝其中。

即：./src/locale/zh_CN/LC_MESSAGES/messages.po

./src/locale/zh_CN/LC_MESSAGES/messages.mo

建立demo.py，Python通过gettext模块支持国际化(i18n),可以实现程序的多语言界面的支持,如下引入gettext模块：

# -*- coding: utf-8 -*-
#!/usr/bin/env python
import gettext
gettext.install('messages', './locale', codeset=False)
gettext.translation('messages', './locale', languages=['zh_CN']).install(True)
print(_("Hello world!"))
print(_("Python is a good Language."))

一切工作准备就绪，运行demo.py，查看是否输出中文：世界你好! Python是门好语言.

另外可以借助工具生成.po和.mo文件，比如Poedit、Zenata等。以下介绍Poedit：

Poedit

下载并安装Poedit，打开Poedit，上方工具栏File，新建，在弹出的弹框填入名称messages，确定后按ctrl+s，就创建了messages.po模板文件。将要翻译的语言写入，如下：

# Copyright (C) 2018 THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# Automatically generated, 2018.
#
msgid ""
msgstr ""
"Project-Id-Version: \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-02-26 12:20+0800\n"
"PO-Revision-Date: 2018-02-26 15:58+0800\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 2.0.6\n"
 
 
msgid "Hello world!"
msgstr "世界你好!"
 
 
msgid "Python is a good Language."
msgstr "Python是门好语言."

保存后，用Poedit打开po文件，并在File中的下拉框“编译为mo文件”，生成messages.mo文件。生成两个文件后拷贝在/locale/zh_CN/LC_MESSAGES/。

重写i18n的实现。i18n.py:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
 
import os
import gettext
import threading
 
localedir = os.path.join(os.path.dirname(__file__), 'locale')
domain = 'messages'
threadLocalData = threading.local()
threadLocalData.locale = 'en_US'
 
# find out all supported locales in locale directory
locales = []
for dirpath, dirnames, filenames in os.walk(localedir):
  for dirname in dirnames:
    locales.append(dirname)
  break
 
AllTranslations = {}
for locale in locales:
  AllTranslations[locale] = gettext.translation(domain, localedir, [locale])
 
def gettext(message):
  return AllTranslations[ threadLocalData.locale ].gettext(message)
 
def ugettext(message):
  return AllTranslations[ threadLocalData.locale ].ugettext(message)
 
def ngettext(singular, plural, n):
  return AllTranslations[ threadLocalData.locale ].ngettext(singular, plural, n)
 
def ungettext(singular, plural, n):
  return AllTranslations[ threadLocalData.locale ].ungettext(singular, plural, n)
 
def setLocale(locale):
  if locale in locales:
    threadLocalData.locale = locale
 
 
if __name__ == '__main__':
   #for test purpose
   for dirpath, dirnames, filenames in os.walk(localedir):
          for dirname in dirnames:
               print(dirname)
          break

demo2.py:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import i18n
 
if __name__ == '__main__':
    i18n.setLocale("zh_CN")
    print(i18n.gettext("Hello world!"))
    print(i18n.gettext("Python is a good Language."))

保存，运行demo2.py,查看结果！

参考：

python库之i18n原理分析

关于OpenStack的i18n语言国际化实现