Javascript - Convert string to UTF-16

Question

I am working with Javascript for one of the first times and its for a sha-1 hash. I have found code to do this, but one of its dependencies is a method to convert the string to utf-8, however the server I am comparing against utilizes utf-16. I have looked around and all my results keep showing up w/ utf-8. Can anybody at least point me in the right direction? Thanks.

Christoph · Accepted Answer · 2010-02-19 23:34:14Z

10

Javascript already uses UTF-16 internally - use charCodeAt() to get the values.

answered Feb 19, 2010 at 23:34

Christoph

171k36 gold badges187 silver badges242 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mike 'Pomax' Kamermans Over a year ago

note: charCodeAt() will not give you the UTF-16 byte codes, it will give you the encoding-less Unicode code point number, so it's not particularly useful unless you also have the codepoint-to-UTF16-bytecode conversion algorithm available.

Christoph Over a year ago

@Mike'Pomax'Kamermans: that's incorrect - charCodeAt() does return UTF-16 code units - see the linked documentation or the ECMA spec; what you describe is codePointAt(), an ES6 addition

Mike 'Pomax' Kamermans Over a year ago

I read the ECMA spec fairly frequently, so here's the spec for it: "String.prototype.charCodeAt(pos) -- Returns a Number (a nonnegative integer less than 2^16) representing the code unit value of the character at position pos in the String resulting from converting this object to a String. If there is no character at that position, the result is NaN." Code unit refers to the Unicode point, not a specific encoding pattern (Unicode itself is encodingless, it's just a list of glyph-X-has-list-number-...)

Christoph Over a year ago

@Mike'Pomax'Kamermans: there are three levels involved: (1) codepoints (aka unicode characters), which go up tu 0x10FFFF (~21 bits), (2) what the ECMA spec calls code unit values which you get by encoding unicode characters via UTF-16 and where higher codepoints are encoded as surrogate pairs (21 > 16), and (3) the byte level, which is just the decision to encode the 16-bit values in little-endian or big-endian order; ECMAScript5 only gives access to the 2nd level, but that's fine as that's what SwiftStriker00 was looking for

Mike 'Pomax' Kamermans Over a year ago

I was testing with too-low characters. "🀀".charCodeAt() does indeed give the surrogate byte value. Apologies.

|

Collectives™ on Stack Overflow

Javascript - Convert string to UTF-16

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related