1

How can I check if data submitted from a form or querystring has certain words in it?

I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.

I'm converting from ASP to PHP. I used to do this using an array in ASP (keep all illegal words in a string and use ubound to check the whole string for those words), but is there a better (efficient) way to do this in PHP?

Eg: A string like this would be rejected: "The administrator dropped a blah blah" because it has admin and drop in it.

I intend using this to check usernames when creating accounts and for other things too.

Thanks

1

7 Answers 7

5

You could use stripos()

int stripos ( string $haystack , string $needle [, int $offset = 0 ] )

You could have a function like:

function checkBadWords($str, $badwords) {
    foreach ($badwords as $word) {
        if (stripos(" $str ", " $word ") !== false) {
            return false;
        }
    }
    return true;
}

And to use it:

if (!checkBadWords('something admin', array('admin')) {
    // ...
}
Sign up to request clarification or add additional context in comments.

4 Comments

Isn't there one too many = in the comparison here? In trying to test efficiency against my preg_replace method, I get a parsing error.
@NullUser Good deal, ty. +1 for beating the preg_replace method in the efficiency race by just about the same margin that preg_replace beat nested loops.
@JGB One obvious reason why this would perform better under certain conditions is that it will return as soon as it finds a match, whereas with preg_replace you need to change the whole string.
@Arte It's been fixed (sort of). It doesn't support proper word boundaries but IMO it's good enough.
3

strpos() will let you search for a substring within a larger string. It's quick and works well. It returns false if the string's not found, and a number (which could be zero, so you need to use === to check) if it finds the string.

stripos() is a case-insensitive version of the same.

I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.

I suspect that you are trying to filter the string so it's suitable for including in something like a database query, or something like that. If this is the case, this is probably not a good way to go about it, and you'd need to actually need to escape the string using mysql_real_escape_string() or equivalent.

1 Comment

+1 Yeah, I was wondering why "drop" was in the list of words to reject.
2
$badwords = array("admin", "drop",);
foreach (str_word_count($string, 1) as $word) {
    foreach ($badwords as $bw) {
        if (strpos($word, $bw) === 0) {
            //contains word $word that starts with bad word $bw
        }
    }
}

For JGB146, here is a performance comparison with regular expressions:

<?php
function has_bad_words($badwords, $string) {

    foreach (str_word_count($string, 1) as $word) {
        foreach ($badwords as $bw) {
            if (stripos($word, $bw) === 0) {
                return true;
            }
        }
        return false;
    }

}

function has_bad_words2($badwords, $string) {

    $regex = array_map(function ($w) {
        return "(?:\\b". preg_quote($w, "/") . ")"; }, $badwords);
    $regex = "/" . implode("|", $regex) . "/";
    return preg_match($regex, $string) != 0;

}

$badwords = array("abc", "def", "ghi", "jkl", "mnop");
$string = "The quick brown fox jumps over the lazy dog";

$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
 has_bad_words($badwords, $string);
}

echo "elapsed: ". (microtime(true) - $start);

$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
 has_bad_words2($badwords, $string);
}

echo "elapsed: ". (microtime(true) - $start);

Example output:

elapsed: 0.076514959335327
elapsed: 0.29999899864197

So regular expressions are much slower.

6 Comments

Others have questioned the efficiency of using preg_replace() but the nested loop method performs 10x worse in my tests.
@Artefacto Hardly a solid comparison when you do multiple other operations and use a completely different form of checking the regex. I copied your code exactly into my tests, and did exactly the same number extraneous operations (e.g. a single string assignment).
Note: I'm not saying your method doesn't work. Merely that it's less efficient than a single call to preg_replace as in my method. Much as my method was significantly less efficient than the single loop comparison function put forth by NullUserException.
@Artefacto I have updated my test-case in my answer. I used an exact copy of your function, functionalized my preg_replace check, and ran it over 10k instances of 4 different inputs. In the end, my method was just under twice as fast. Please compare apples to apples if you would like to run your own tests.
Note: Since this is a lot closer (both to preg_replace performance and to that of the accepted answer) over realistic input than my first test showed, I've reversed my downvote.
|
0

You could use regular expression like this:

preg_match("~(admin)|(drop)|(another token)|(yet another)~",$subject);

building the pattern string from array

$pattern = implode(")|(", $banned_words);
$pattern = "~(".$pattern.")~";

2 Comments

You could clean this up a bit using implode(). BTW this will have problems if your $banned_words array is empty.
Good idea, edited, but I think it still fails the empty array.
0
function check($string, $array) {
    foreach($array as $item) {
        if( preg_match("/($item)/", $string)  )
            return true;
    }
    return false;
}

Comments

0

You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.

Originally, I was thinking you could do this with a simple preg_match() call (hence the downvote), however preg_match does not support arrays. Instead, you can do a replacement via preg_replace to have all rejected strings replaced with nothing, and then check to see if the string is changed. This is simple and avoids requiring a loop iteration for each rejected string.

$rejectedStrs = array("/admin/", "/drop/", "/create/");
if($input == preg_replace($rejectedStrs, "", $input)) {
   //do stuff
} else { 
   //reject
}

Note also that you can provide case-insensitive searches by using the i flag on the regex patterns, changing the array of patterns to $rejectedStrs = array("/admin/i", "/drop/i", "/create/i");

On Efficiency

There has been some debate about the efficiency of doing it this way vs the accepted nested loop method. I ran some tests and found the preg_replace method executed around twice as fast as the nested loop. Here is the code and output of those tests:

$input = "You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement. You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.";

$input = "Short string with no matches";
$input2 = "Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. ";
$input3 = "Short string which loop will match quickly";
$input4 = "Longer string that will eventually be matches but first has a lot of words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words and then finally the word create near the end";

$start1 = microtime(true);
$rejectedStrs = array("/loop/", "/operation/", "/create/");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (preg_check($rejectedStrs, $input)) $p_matches++;
    if (preg_check($rejectedStrs, $input2)) $p_matches++;
    if (preg_check($rejectedStrs, $input3)) $p_matches++;
    if (preg_check($rejectedStrs, $input4)) $p_matches++;
}

$start2 = microtime(true);
$rejectedStrs = array("loop", "operation", "create");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (loop_check($rejectedStrs, $input)) $l_matches++;
    if (loop_check($rejectedStrs, $input2)) $l_matches++;
    if (loop_check($rejectedStrs, $input3)) $l_matches++;
    if (loop_check($rejectedStrs, $input4)) $l_matches++;
}

$end = microtime(true);
echo "preg_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$end."=".($end-$start2);

function preg_check($rejectedStrs, $input) {
    if($input == preg_replace($rejectedStrs, "", $input)) 
        return true;
    return false;
}

function loop_check($badwords, $string) {

    foreach (str_word_count($string, 1) as $word) {
        foreach ($badwords as $bw) {
            if (stripos($word, $bw) === 0) {
                return true;
            }
        }
        return false;
    }

}

Output:

preg_match: 1281908071.4032 1281908071.9947= 0.5915060043335

loop_match: 1281908071.9947 1281908073.006=1.0112948417664

9 Comments

This won't work because preg_replace expects you to give it one or more regular expressions - you could make these "/admin/", etc, but this is still a pretty inefficient way of doing this.
Indeed. I've edited accordingly. Is this really significantly less efficient than looping through every string and checking to see if each is contained? I'll run a few tests.
@thomasrutter My tests show the preg_replace method executing 10x faster than the nested loop method. Post updated to show my test and output.
Except the two functions are not comparable, they don't do the same thing... see my edit.
@Artefacto They don't have to do the same thing to reach the same results. If your concern is the same word-boundary concern you mentioned with NullUser's answer, then consider the fact that preg_replace will preform exactly the same if you add a requirement for a whitespace character to the beginning of each pattern.
|
0

This is actually pretty simple, use substr_count.

And example for you would be:

if (substr_count($variable_to_search, "drop"))
{
    echo "error";
}

And to make things even simpler, put your keywords (ie. "drop", "create", "alter") in an array and use foreach to check them. That way you cover all your words. An example

foreach ($keywordArray as $keyword)
{
    if (substr_count($variable_to_search, $keyword))
    { 
        echo "error"; //or do whatever you want to do went you find something you don't like
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.