JavaScript Lecture
31 May 2025

isWellFormed checks if a string has � or "broken code point"

We get a “broken” string when splitting incorrectly texts that contain surrogate pair code points (like emoji) and reusing parts of that.

const word = '😀'
const items = word.split('')

console.log(items)
// ['\uD83D', '\uDE00']

const array = ['A', 'B', items[0], 'C']

const text = array.join('')

console.log(text)
// AB�C

I often see � in console or log files. 😀 has 2 code points in UTF-16 and JS function splits 😀 to 2 code points. Each means nonsense in literal.

D83D … High surrogate
DE00 … Low surrogate

High or low surrogate can’t be seen as a symbol but the pair of them is recognized as a symbol.

The “text” in the above example has single meaningless surrogate, which is shown as �.

How to check if a string has meaningless surrogates

The __isWellFormed()__ checks if a string has lone surrogates.
const word = '😀'
const items = word.split('')

console.log(items)
// ['\uD83D', '\uDE00']

const array = ['A', 'B', items[0], 'C']

const text = array.join('')

console.log(text)
// AB�C

let isGoodText = true

if (text.isWellFormed()) {
	isGoodText = true

} else {
	isGoodText = false
}

console.log(isGoodText)
// false

This function is particularly useful when we make or format a text using regex and code points.