Detect unicode language of a string in javascript

Question

I have a string that contains a few words. I want to find out all the words that contain only characters of Tamil Unicode. I am new to javascript.

Using Go, I do the same like:

            tokens := strings.Fields(stringContent, delim) // split based on delim, say space

            for _, token := range tokens { //like foreach
                r, l := utf8.DecodeRuneInString(token)
                if l != 1 {
                    if unicode.Is(unicode.Tamil, r) {
                        // Tamil word
                    }
                }
            }

I found that string.split() will give me the individual words based on the delimiter, in javascript. But I am not able to find out how to get if the word is a UTF-8 TAMIL word. Can someone help me achieve this in javascript ?

Diode · Accepted Answer · 2012-08-16 08:37:22Z

10

Easy way is to do a regular expression match for words having characters in a unicode range

Hope this helps : http://kourge.net/projects/regexp-unicode-block

A sample with which you can start

"இந்தியா ASASAS எறத்தாழ ASSASAS குடியரசு ASWED SAASAS".match(/[\u0B80-\u0BFF]+/g);

edited Aug 16, 2012 at 8:37

answered Aug 16, 2012 at 8:07

Diode

25.2k8 gold badges43 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sankar Over a year ago

I tried: <button onclick="myFunction()">Try it</button> <script type="text/javascript"> function myFunction() { alert("இந்தியா".match(/[\u0B80-\u0BFF]+/g); } </script> </body> </html> but nothing happens.

Diode Over a year ago

btw the code I have shown will match only whole words. you have to modify the reg exp if you want to match a word which is a mix of tamil chars and english chars

Sankar Over a year ago

Got it. The problem was because the default encoding on my browser was some western and so the alert was displaying as null. I changed it to UTF-8 and then the word displayed fine. Thanks.

Collectives™ on Stack Overflow

Detect unicode language of a string in javascript

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related