1

I'm looking to improve nginx caching by removing irrelevant query parameters (that could come from web crawlers or similar) from the request. I have come across an unwieldy solution on the internet:

set $c_uri $args; # e.g. "param1=true&param4=false"

# remove unwanted parameters one by one
if ($c_uri ~ (.*)(?:&|^)pd=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)mid=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)ml=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)contact_eid=[^&]*(.*)) { set $c_uri $1$2 ; }
...

set $c_uri $scheme://$host$uri$c_uri;
...

location / {
  # set $c_uri as cache_key
  proxy_cache_key $c_uri;
  ...
}
    

It works, but it's not very concise, takes a lot of steps and from what I learned, if is evil.

I know there are maps, which can do basic regex things but they don't work in this scenario (because there can be any number of parameters in any order that I need to remove).

I also found this substitution module which can do regex replace but it's only made for specific operations and not for setting a variable.

So I have two questions:

  • Does anyone know whether there is some tooling to set a variable by doing a regex replace operation?
  • Is using if in this case really that bad? It's not inside a location context and I don't know whether many consecutive regexes are actually worse than one large regex replace.

I would be very thankful if someone with more nginx know-how could weigh in here and help me out. Thanks :)

6
  • Would it be an option to keep a defined set of args instead of removing unwanted ones? Commented Sep 21, 2021 at 12:57
  • "If is Evil... when used in location context" - however, you are not using if in a location context. Commented Sep 21, 2021 at 13:40
  • @slauth sadly no, it's a large application and there are many possible args Commented Sep 21, 2021 at 14:47
  • @RichardSmith you are right and thank you for answering my second question. Still, I'm not sure what the performance implications of many if statements are. Commented Sep 21, 2021 at 14:48
  • 1
    Currently you have a long list of regular expressions, that each need to be evaluated individually for every request. If any or all of the arguments may appear in any request in a random order, then your solution is probably best. Commented Sep 21, 2021 at 15:11

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.