Translation Syntax Support

Last modified by Vincent Massol on 2024/11/19 16:15

 XWiki
 Requirements
 Idea
 

Description

Motivation

At the moment, translations are supposed to be just plain text. However, frequently the need arises to highlight some part of the message or to insert a link in the message, with different dynamic parts (e.g., just the target URL or the whole link or also nothing as it points to some documentation on XWiki.org). The documented best practice to insert syntax is to add placeholders for it but this has the problem that then the whole translation is interpreted as the desired syntax (usually XWiki syntax or HTML). Therefore, the idea of this proposal is to add support for syntax in translations.

Goals

  • Support simple formatting (like boldĀ and italic) and links in translations.
  • Provide a new scripting API that is easy to use, in particular:
    • Separate clearly between parameters that contain untrusted input and parameters that contain syntax (if supported, see below)
    • Don't provide plain output without escaping by default as this is a sure way to introduce escaping issues. Instead, a syntax should always be specified when getting the result as string.
  • Whatever we introduce should also work in JavaScript.
  • Make it as easy as possible to keep the existing translations to avoid additional work both for developers to deprecate existing and introduce new translations as well as translators.
  • During parsing, don't put different parts into a string (like parameters and translation string) but instead develop a parser that is aware of the different parts. This is to avoid any issues due to bad escaping in the combined string.

Non-Goals

  • Full XWiki syntax or HTML support, translations should be intentionally simple.
  • Support for referencing other translations or any other kind of variable or scripting support (apart from the existing support for choices, dates, times and numbers).

SotA

Reminder of known translations constraints:

Variants

The main questions for this feature are the following:

  • Where should syntax be supported? Just in parameters, just in the translation message itself or both?
  • Which syntax elements from XWiki syntax should be supported?
  • Should we also change other things, like support named placeholders?
  • If we extend what can be done in a translation, should there be a new syntax type for translations and if yes, how is it indicated and what do we do with existing translations that already contain syntax?

Where Should Syntax Be Supported?

  1. In parameters. This matches our current best practice and therefore some existing translations. The disadvantage is that for translators, it may be hard to understand why there are placeholders in the message and they might not understand how to order them, in particular in languages that differ in word order or even writing direction.
  2. In the translation itself. This matches many existing translations. The advantage is that translators see clearly the syntax but at the same time this is also a disadvantage as now translators need to know the syntax. It might be possible to use a WYSIWYG editor in Weblate, though. Further, the question is if all existing translations should be treated as being written with the new syntax (as some of them actually do contain syntax) or if some way to specify the syntax should be introduced. In the code, there actually exists the concept of a syntax in which translations are written.
  3. No direct support for syntax. Instead support arbitrary blocks as parameters! The idea is as follows:
    • Provide script APIs that allow creating common blocks (like links, emphasize, bold, italics) easily without writing syntax (that would need escaping again). In most cases, the content of, e.g., a bold syntax will be dynamic, anyways, so it makes sense to insert the whole block in a parameter.
    • For links, introduce a new best practice which is to have the link label in a separate translation message but with comments on both translations that mention the connection. The same technique can be used in the case of emphasized or bold text that is not dynamic. This is the same as what Vue.js is suggesting and they also support the same parameter syntax so this should be easily useable in Vue.js.
    • Block support would require implementing our own parser for MessageFormat to be able to cleanly insert blocks at the place of parameters.

Which Syntax Elements Should Be Supported?

  1. Links. Support for links is a must-have, but it is not clear which parameters should be supported - arbitrary parameters or just "anchor", "queryString" and maybe "target"?
    • Idea: Use "look-alike" XWiki 2.0 syntax for links without parameters, only support the whole reference as placeholder.
  2. Bold text.
  3. Italics.
  4. Underline?
  5. Striked out?
  6. Monospace?
  7. Superscript?
  8. Subscript?
  9. Icons
  10. Images
  11. CSS classes, e.g., on links
  12. Plural (currently with the choice syntax)
  13. Anything else?

Other Changes

It is quite likely we'll need to rewrite the parser for format strings. This provides the opportunity to change other things.

  1. Support for named placeholders, i.e., instead of "Hello {0}", we could have "Hello {first_name}". This would give translators more information about the content of the placeholder.
  2. Add a Weblate checker to ensure parameters are not removed
  3. Syntax highlighting could make it easier to distinguish between actual text, identifier and syntax tokens
  4. Choice syntax is hard to understand. Can it be replaced by something easier? For different languages, different pluralization rules need to be supported, see, e.g., custom pluralization in Vue I18n.

All of these general improvements would need a new translation syntax version. For now, it seems better to implement the new features (blocks as parameters) with the existing translation syntax in order to be able to use it with existing translations and to make the scope of the changes smaller. General improvements of translation syntax could be implemented later and should be independent of the (script service) API.

We have open issues for translation issue on non-English language. We need to check if we can address them.

Translation Type

At the moment, all translations have the syntax messagetool/1.0. When translations contain other syntax, they should have a different syntax. However, at the moment, there is no way to indicate the syntax of a translation/a translation file.

API

Script Service

Below is an example usage of a possible script service API:

$services.localization.translate($key)
  .withLocale($locale)
  .withParameter($firstParam)
  .withLinkParameter($reference, $label)
  .withFormattedParameter($format, $text)
##  .withTranslationSyntaxParameter($secondParam)
  .render($syntax)

The key points are:

  • A builder-like API is used to avoid parameter position and type confusion.
  • Parameters are escaped by default.
  • Special methods allow directly specifying link label and reference or the format such that no escaping is required by the user.
  • If syntax in parameters is supported, these parameters must be specified using a special method (withTranslationSyntaxParameter).
  • The output syntax is a mandatory parameter of the render()-method, making it clear for which syntax the translation is rendered.

As an alternative to this explicit support for different content types we could also make it easier to construct blocks in Velocity. This could make it easier, to, e.g., provide an image in a link, a raw HTML block or link that is emphasized. In general, the simple withParameter syntax would also accept a Block instead of a string.

Translation Macro

Syntax support in translations itself should be transparent to the translation macro. Parameters with syntax are difficult to support, ideas are welcome.

JavaScript

For JavaScript, we would also need to support syntax. At the moment, the JavaScript API just returns plain strings. Ideally, we would support an API similar to the script API, though probably with less options. Also, in JavaScript, it could make sense to let the API return DOM nodes instead of strings, though for some cases we probably still need HTML (e.g., when a library expects a string and not a node).

For getting the translation output, it is also the question how the syntax is implemented in JavaScript. As escaping HTML is simpler, we could get away with server-side rendering to HTML with placeholders and then replacing the placeholders in JavaScript. If syntax in parameters should be supported, this becomes more difficult. In theory, it might still be possible to provide these parameters to the REST API as most of the syntax should be known beforehand, but this seems quite ugly. The cleanest solution would certainly be to re-implement the parser in JavaScript, ideally by using a grammar that can also be compiled to JavaScript.

It might also be a good solution to just encourage using an existing client-side framework for translations like vue-18n and delegate parameter handling to that framework. In general, it seems okay to have differences for client-side translations, this is also the currently the case (e.g., no support for choice syntax).

Examples of Existing Complex Translation Values

Example 1

https://l10n.xwiki.org/translate/xwiki-platform/xwiki-core-resources/en/?checksum=87f26e63804faa2c#comments

Value: {0}Version{1} coming from extension {2}{3} {4}{5}

Example of use:

$services.localization.render('core.viewers.history.extension.label', 
  [
    "<a href='$_versionURL'>",
    '</a>',
    "<strong>",
    $escapetool.xml($documentExtension.name), 
    $escapetool.xml($documentExtension.id.version),
    '</strong>'
  ]
)
  • Complex to understand and translate

Draft of alternative solution

Value: [[Version>>{versionUrl}]] coming from extension **{name} {version}**

Use: {{translate key='...' params='{versionUrl: '', name: '', version: ''}' /}}

Example 2

xe.activity.messages.error.loginToSendMessage=You need to [[log in>>{0}]] before sending messages.

Example 3

attachment.move.alreadyExists=An attachment with the given name ({0}) already exists on <a href="{2}">{1}</a>. Please provide a different name.

Client Side example 1

var denymessage = "$escapetool.javascript($services.localization.render('rightsmanager.denyrightforcurrentuser'))".replace('__right__', self.right);

Client Side example 2

itemCount.text(l10n['docextra.extranb'].replace("__number__", attachmentsNumber));

Client Side example 3

translation.png

See LivedataPagination.vue for how it is rendered (spoiler: several concatenated translation values)

See https://kazupon.github.io/vue-i18n/guide/interpolation.html#basic-usage for client side support on Vue.js


 

Get Connected