1

We have a job that pulls messages off a Kafka topic. The job runs hourly and it's important that the job complete before the next hour arrives.

I'm trying to set up an alert that will tell me that the consumergroup_lag isn't going to hit zero before the next hour occurs. I'm not really sure how to do this in PromQL.

So far, I have

predict_linear(kafka_consumergroup_group_lag{topic="foo",group="bar"}[10m], 3600) > 0

but that always looks 60 minutes ahead instead of ahead until the next hour boundary. I've looked at the hour() function and joining, but I'm not putting two and two together. I don't actually even know if what I want to do is possible in PromQL. But basically, I want to replace the 3600 in the above query with something like the little shell bit below

$(( $(date -d "$(date -d 'next hour' +'%Y-%m-%d %H:00:00%z')" +%s) - $(date +%s) ))

I know there will be issues around daylight saving time changes, but I'm not too worried about that right now.

3
  • I don't think anything like you described is possible within promql. OTOH, not sure it is needed. How do you envision other parsms of that alert? Do you need for this expression to be evaluated every minute (or whatever evaluation interval) and raise alert if condition is not meant? Commented Feb 8 at 1:12
  • Or maybe you could (or even want) to have predefined number of checks. For example every hour at :15, :30 and :45? Commented Feb 8 at 1:16
  • I found a Google Group post that seemed to indicate that what I'm trying to do is not possible. I think ultimately, I can live with just having the alerting rule fire if the predict_linear result is greater than zero for more than 20 minutes. If you want to answer that what I want to do isn't possible, I'll mark it as accepted. Commented Feb 10 at 21:32

1 Answer 1

1

You could try using the predict_linear function with the time window set to the remaining seconds until the next hour. This should allow you to check if the lag would be "gone" when the next hour is reached. Something like:

predict_linear(
  kafka_consumergroup_group_lag{topic="my-topic",consumergroup="group"}[10m],
  3600 - (time() % 3600)
)

Using 3600 - (time() % 3600) for the time window returns the amount of time in seconds until the next hour. This will not be 100% exact to the 00 but it should give you a good approximation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.