We get a "broken" string when splitting incorrectly texts that contain surrogate pair code points (like emoji) and reusing parts of that.
const word = '😀' const items = word.split('') console.log(items) // ['\uD83D', '\uDE00'] const array = ['A', 'B', items[0], 'C'] const text = array.join('') console.log(text) // AB�C
I often see � in console or log files. 😀 has 2 code points in UTF-16 and JS function splits 😀 to 2 code points. Each means nonsense in literal.
D83D ... High surrogate
DE00 ... Low surrogate
High or low surrogate can't be seen as a symbol but the pair of them is recognized as a symbol.
The "text" in the above example has single meaningless surrogate, which is shown as �.
How to check if a string has meaningless surrogates
The isWellFormed() checks if a string has lone surrogates.
const word = '😀' const items = word.split('') console.log(items) // ['\uD83D', '\uDE00'] const array = ['A', 'B', items[0], 'C'] const text = array.join('') console.log(text) // AB�C let isGoodText = true if (text.isWellFormed()) { isGoodText = true } else { isGoodText = false } console.log(isGoodText) // false
This function is particularly useful when we make or format a text using regex and code points.