isWellFormed checks if a string has � or "broken code point"
We get a “broken” string when splitting incorrectly texts that contain surrogate pair code points (like emoji) and reusing parts of that.
const word = '😀'
const items = word.split('')
console.log(items)
// ['\uD83D', '\uDE00']
const array = ['A', 'B', items[0], 'C']
const text = array.join('')
console.log(text)
// AB�CI often see � in console or log files. 😀 has 2 code points in UTF-16 and JS function splits 😀 to 2 code points. Each means nonsense in literal.
D83D … High surrogate
DE00 … Low surrogate
High or low surrogate can’t be seen as a symbol but the pair of them is recognized as a symbol.
The “text” in the above example has single meaningless surrogate, which is shown as �.
How to check if a string has meaningless surrogates
The __isWellFormed()__ checks if a string has lone surrogates.const word = '😀'
const items = word.split('')
console.log(items)
// ['\uD83D', '\uDE00']
const array = ['A', 'B', items[0], 'C']
const text = array.join('')
console.log(text)
// AB�C
let isGoodText = true
if (text.isWellFormed()) {
isGoodText = true
} else {
isGoodText = false
}
console.log(isGoodText)
// falseThis function is particularly useful when we make or format a text using regex and code points.