Jeheonpark
3 min readOct 8, 2023

--

How to handle composite emojis in Kotlin

Grapheme Cluster

A grapheme cluster represents the smallest sequence of code points that a user perceives as a single character in text. In a Unicode string, one grapheme cluster often corresponds to what is thought of as a single “user-perceived character”, but this doesn’t always match a single Unicode code point.

For example:
- “a” is a grapheme cluster by itself.
- However, an accented character, like “á”, can be represented as a combination of “a” and “´”. Here, the combination of “a” and “´” forms one grapheme cluster.

The importance of grapheme clusters becomes evident in text processing tasks, especially in rendering or text input processing. For instance, when a user navigates text using arrow keys on a keyboard, a grapheme cluster should behave as one “character”.

Therefore, when processing strings, it’s crucial to consider them at the level of grapheme clusters rather than merely at code points or bytes.

Addressing Emoji with Grapheme Clusters

There are several key considerations when dealing with emojis and grapheme clusters:

1. Composite Emojis: Some emojis are represented by combinations of multiple code points. For instance, emojis with skin tone variations or country flags use combinations of code points. These should be treated as a single grapheme cluster.

--

--

Jeheonpark

Jeheon Park, Software Engineer at Kakao in South Korea