4

I have a string that contains a few words. I want to find out all the words that contain only characters of Tamil Unicode. I am new to javascript.

Using Go, I do the same like:

            tokens := strings.Fields(stringContent, delim) // split based on delim, say space

            for _, token := range tokens { //like foreach
                r, l := utf8.DecodeRuneInString(token)
                if l != 1 {
                    if unicode.Is(unicode.Tamil, r) {
                        // Tamil word
                    }
                }
            }

I found that string.split() will give me the individual words based on the delimiter, in javascript. But I am not able to find out how to get if the word is a UTF-8 TAMIL word. Can someone help me achieve this in javascript ?

1 Answer 1

10

Easy way is to do a regular expression match for words having characters in a unicode range

Hope this helps : http://kourge.net/projects/regexp-unicode-block

A sample with which you can start

"இந்தியா ASASAS எறத்தாழ ASSASAS குடியரசு ASWED SAASAS".match(/[\u0B80-\u0BFF]+/g);
Sign up to request clarification or add additional context in comments.

3 Comments

I tried: <button onclick="myFunction()">Try it</button> <script type="text/javascript"> function myFunction() { alert("இந்தியா".match(/[\u0B80-\u0BFF]+/g); } </script> </body> </html> but nothing happens.
btw the code I have shown will match only whole words. you have to modify the reg exp if you want to match a word which is a mix of tamil chars and english chars
Got it. The problem was because the default encoding on my browser was some western and so the alert was displaying as null. I changed it to UTF-8 and then the word displayed fine. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.