Bogdan Ungureanu

Random things about PHP and sometimes TypeScript

A random software engineer from Romania.

Software Engineer @ Automattic

  • PHP Intl now contains a new class that allows us to display arrays as locale-aware string lists – IntlListFormatter.

    The purpose of it is relatively simple since it’s just a wrapper for ICU’s ListFormatter: take an array, pass a locale and display it as a string.

    <?php
    
    $formatter = new IntlListFormatter('en', IntlListFormatter::TYPE_AND, IntlListFormatter::WIDTH_WIDE);
    
    echo $formatter->format([1, 2, 3]); //outputs "1, 2, and 3"
    
    $formatter = new IntlListFormatter('en', IntlListFormatter::TYPE_AND, IntlListFormatter::WIDTH_SHORT);
    
    echo $formatter->format([1, 2, 3]); //outputs "1, 2, & 3"

    There are two optional parameters: conjunction type and width. The values accepted by these parameters are exposed as class constants and are a direct mapping to ICU’s accepted values:

    • IntlListFormatter::TYPE_AND
    • IntlListFormatter::TYPE_OR
    • IntlListFormatter::TYPE_UNITS
    • IntlListFormatter::WIDTH_WIDE
    • IntlListFormatter::WIDTH_NARROW
    • IntlListFormatter::WIDTH_SHORT

    While this looks like a trivial feature that can be done in userland, things can get complicated if you need to support non-Latin locales. For example, for Japanese you don’t use a comma as a separator:

    $formatter = new IntlListFormatter('ja_JP');
    
    echo $formatter->format([1, 2, 3]); //outputs "1、2、3"

    While you may say that’s a locale problem, we can observe differences between English locale too. For example, Oxford comma is used in en_US while en_GB is not:

    $formatter = new IntlListFormatter('en_US');
    
    echo $formatter->format([1, 2, 3]); //outputs "1, 2, and 3"
    
    $formatter = new IntlListFormatter('en_GB');
    
    echo $formatter->format([1, 2, 3]); //outputs "1, 2 and 3"

    ICU constraints

    Except for IntlListFormatter::TYPE_AND and IntlListFormatter::WIDTH_WIDE which work with all ICU versions supported by PHP, using the other ones with a lower version than ICU 67 will trigger an exception or fatal since they aren’t supported with those versions.

    You can see this exception in action on 3v4l where the master branch is built with ICU 63.

    It’s a small feature, but it represents my first real step into contributing to PHP in a more meaningful way. I hope you find it useful!

  • I was working recently on a small feature that needed to output numbers like 1200 to a short compact version like 1.2k. Simple right? I can just make a function or use ChatGPT to generate one for me:

    function formatCompactNumber($number, $decimals = 1) {
        if ($number >= 1000000) {
            return number_format($number / 1000000, $decimals) . 'M';
        } elseif ($number >= 1000) {
            return number_format($number / 1000, $decimals) . 'k';
        } else {
            return (string)$number;
        }
    }

    Task done. Except…

    It’s not-so-straightforward if you need the value to be locale aware. For starters, the suffix needs to be translated and the decimal separator also needs to be different depending on the language.

    For example, for English, the value is 1.2K, but that’s not the case for French where it’s 1,2 k. Simple right? Nope. If you support Spanish, 1200 translates to 1,2 mil. Hmm… What about non-latin alphabets? Well, for Bulgarian is 1,2 хил. What about Chinese and Russian? You get the point and the list goes on and on.

    If you go down that rabbit hole, you also need to be sure that you have these rules for every language your software supports.

    How does JavaScript solve it?

    JS has a class built-in called Intl.NumberFormat that supports a bunch of formatting types: standard, scientific, compact.

    For our code we need to use the compact form, like this:

    new Intl.NumberFormat("bg", {
      notation: "compact",
    }).format(1200);
    //
    '1,2 хил.'

    Simple, easy, and probably not worth writing a post for it.

    In PHP it’s not so straightforward

    PHP comes bundled with the Intl extension that offers a NumberFormatter class that should do the same thing: it offers a bunch of formatters for numbers: scientific, percent, currency, etc. Except it doesn’t have a compact form.

    Turns out, the Intl extension is just a wrapper on top of Unicode ICU library, just like Javascript. It’s actually mentioned in the Intl intro page:

    Internationalization extension (further is referred as Intl) is a wrapper for » ICU library, enabling PHP programmers to perform various locale-aware operations including but not limited to formatting, transliteration, encoding conversion, calendar operations, » UCA-conformant collation, locating text boundaries and working with locale identifiers, timezones and graphemes.

    Well, according to ICU docs, the library actually supports two compact forms: long and short. So, in theory, PHP should support it as well. And… it actually does! If you pass a magic number (14) it does display the number in the short-compact format. Not only that, it’s working since PHP 5! PHP simply doesn’t expose it as a constant or in the docs.

    <?php
    $numberFormatter = new NumberFormatter('bg', 14);
    echo $numberFormatter->format(1200) . PHP_EOL;
    // 1,2 хил.
    

    Since the change is trivial, I’ve opened a PR in the PHP repo that got merged and now NumberFormatter exposes two new constants:

    • NumberFormatter::DECIMAL_COMPACT_SHORT
    • NumberFormatter::DECIMAL_COMPACT_LONG
    <?php
    $numberFormatter = new NumberFormatter('bg', NumberFormatter::DECIMAL_COMPACT_SHORT);
    echo $numberFormatter->format(1200) . PHP_EOL;

    You can see it in action on 3v4l on PHP’s master branch and will probably be available with PHP 8.5.