9

I have a string such like that : "Xin chào tất cả mọi người". There are some Unicode characters in the string. All that I want is writing a function (in JS) to check if there is at least 1 Unicode character exists.

1
  • 3
    JavaScript strings do not "contain UTF-8" characters. They contain Unicode code-points (encoded as one code-point/character for Unicode in the BMP - whatever UTF-16/UCS-2 internal coding is an entirely different can of worms). So, now what is a "UTF-8 character"? Do you mean Unicode character not in the ASCII plane? Commented Feb 12, 2014 at 4:37

2 Answers 2

11

A string is a series of characters, each which have a character code. ASCII defines characters from 0 to 127, so if a character in the string has a code greater than that, then it is a Unicode character. This function checks for that. See String#charCodeAt.

function hasUnicode (str) {
    for (var i = 0; i < str.length; i++) {
        if (str.charCodeAt(i) > 127) return true;
    }
    return false;
}

Then use it like, hasUnicode("Xin chào tất cả mọi người")

Sign up to request clarification or add additional context in comments.

Comments

2

Here's a different approach using regular expressions

function hasUnicode(s) {
    return /[^\u0000-\u007f]/.test(s);
}

1 Comment

The performance result will be different.. and your code doesn't detect the à .. your regex should be /[^\u0000-\u007f]/ :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.