mastodon.ar.al is one of the many independent Mastodon servers you can use to participate in the fediverse.
This is my personal fediverse server.

Administered by:

Server stats:

1
active users

Aral Balkan

Anyone know how to count Unicode glyphs in Vala?

Eg., “🙄👈” = two glyphs

This is currently what I’m banging my head against.

To wit, what I’ve already tried:

- string.length returns bytes
- get_char_count() returns characters (eg. an emoji might be 5 characters)
- I can’t seem to find a way to split a string into an array of glyphs (in JavaScript: […str].length does the trick). e.g., string.split throws an error in Vala if called with an empty string as the delimiter.

I must be missing something very basic here but can’t seem to find any resources online.

@aral
public static int main (string[] args) {
string str = "🙄👈";
int letters = str.char_count ();
int bytes = str.length;

// Output: ``letters: 9, bytes: 19``
print ("letters: %d, bytes: %d\n", letters, bytes);
return 0;
}
→ letters: 2, bytes: 8

Using the example found at valadoc.org/glib-2.0/string.ch

valadoc.orgstring.char_count – glib-2.0The canonical source for Vala API references.
@aral The rough equivalent of your JS code would be calling https://valadoc.org/glib-2.0/string.to_utf32.html which will return the corresponding UTF-32 characters for your string. Since UTF-32 can fit every unicode code point into a single value, this is *effectively* a list of Unicode code points...

...not glyphs, because your JS code was also returning code points, not glyphs. (E.g. try running the JS example with a skin tone emoji, which is one glyph made of two code points.)
valadoc.orgstring.to_utf32 – glib-2.0The canonical source for Vala API references.

@refi64 Thanks! I did try that last night but couldn’t initially find a way to get the length (and had to leave it as we’re prepping to move house).

Still, there must be a way to actually count glyphs as the text buffer’s cursor manages to properly… 🤔

@aral Unfortunately I only know of one language that provides this ootb (Swift), and I don't *think* there's a way to access this how UIs do? (Text rendering libs like Pango generally go through much more complicated processes, since the available characters also depend on what's provided by the font, not just the actual glyphs in the string...) If you're feeling adventurous, I believe you could try calling ICU from Vala land, in particular you'd want https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/ubrk_8h.html#ae3ac488d6827b8476e2b330c3e68a906
unicode-org.github.ioICU 70.1: ubrk.h File Reference