高校国語785262 views
小学社会308525 views
雑学1472443 views
小学理科716917 views
LaTeX956652 views
世界の国559994 views
Computer364642 views
MathPython489968 views
数学講師2847003 views
りんご189887 views

JavaScript Lecture

Help
Tools

English

isWellFormed checks if a string has � or "broken code point"

We get a “broken” string when splitting incorrectly texts that contain surrogate pair code points (like emoji) and reusing parts of that.

const word = '😀'
const items = word.split('')

console.log(items)
// ['\uD83D', '\uDE00']

const array = ['A', 'B', items[0], 'C']

const text = array.join('')

console.log(text)
// AB�C

I often see � in console or log files. 😀 has 2 code points in UTF-16 and JS function splits 😀 to 2 code points. Each means nonsense in literal.

D83D … High surrogate
DE00 … Low surrogate

High or low surrogate can’t be seen as a symbol but the pair of them is recognized as a symbol.

The “text” in the above example has single meaningless surrogate, which is shown as �.

How to check if a string has meaningless surrogates

The __isWellFormed()__ checks if a string has lone surrogates.
const word = '😀'
const items = word.split('')

console.log(items)
// ['\uD83D', '\uDE00']

const array = ['A', 'B', items[0], 'C']

const text = array.join('')

console.log(text)
// AB�C

let isGoodText = true

if (text.isWellFormed()) {
	isGoodText = true

} else {
	isGoodText = false
}

console.log(isGoodText)
// false

This function is particularly useful when we make or format a text using regex and code points.