3

I am working with Javascript for one of the first times and its for a sha-1 hash. I have found code to do this, but one of its dependencies is a method to convert the string to utf-8, however the server I am comparing against utilizes utf-16. I have looked around and all my results keep showing up w/ utf-8. Can anybody at least point me in the right direction? Thanks.

1 Answer 1

10

Javascript already uses UTF-16 internally - use charCodeAt() to get the values.

Sign up to request clarification or add additional context in comments.

6 Comments

note: charCodeAt() will not give you the UTF-16 byte codes, it will give you the encoding-less Unicode code point number, so it's not particularly useful unless you also have the codepoint-to-UTF16-bytecode conversion algorithm available.
@Mike'Pomax'Kamermans: that's incorrect - charCodeAt() does return UTF-16 code units - see the linked documentation or the ECMA spec; what you describe is codePointAt(), an ES6 addition
I read the ECMA spec fairly frequently, so here's the spec for it: "String.prototype.charCodeAt(pos) -- Returns a Number (a nonnegative integer less than 2^16) representing the code unit value of the character at position pos in the String resulting from converting this object to a String. If there is no character at that position, the result is NaN." Code unit refers to the Unicode point, not a specific encoding pattern (Unicode itself is encodingless, it's just a list of glyph-X-has-list-number-...)
@Mike'Pomax'Kamermans: there are three levels involved: (1) codepoints (aka unicode characters), which go up tu 0x10FFFF (~21 bits), (2) what the ECMA spec calls code unit values which you get by encoding unicode characters via UTF-16 and where higher codepoints are encoded as surrogate pairs (21 > 16), and (3) the byte level, which is just the decision to encode the 16-bit values in little-endian or big-endian order; ECMAScript5 only gives access to the 2nd level, but that's fine as that's what SwiftStriker00 was looking for
I was testing with too-low characters. "🀀".charCodeAt() does indeed give the surrogate byte value. Apologies.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.