JavaScript Lecture
31 May 2025

Surrogate pair regex is a good alternative to split('') when splitting a string by space

__The split('')__ returns a strange array when an original text contains emojis.
const word = 'AB😀C'

const items = word.split('')

console.log(items)
// ['A', 'B', '\uD83D', '\uDE00', 'C']

😀 is split to two code points. JavaScript uses UTF-16 and 😀 is expressed by UTF-16 surrogate pair. The below helps us to understand JS String object.

const word = 'AB😀C'

console.log(word.length)
// 5

An alternative to split function

Regular expression and match enable a string to be split as an emoji length is 1.

const word = 'AB😀C'

const array = word.match(/([\uD800-\uDBFF][\uDC00-\uDFFF])|./g)

console.log(array)
// ['A', 'B', '😀', 'C']

Surrogate pair is a pair of high and low surrogate. Both code point ranges are:

HighD800 - DBFF
LowDC00 - DFFF