-5

Background

I'm working on processing a comma-separated username list (for an ACL whitelist optimization in my project) and need to normalize whitespace around commas, as well as trim leading/trailing whitespace from the string.

Code & Issue

I used this regex replacement to clean up the string:

const input = "a,b,c ";
const result = input.replace(/\s*,\s*|^\s*|\s*$/g, ',');
console.log(result); // Outputs "a,b,c,," (two trailing commas)

"a,b,c ".replace(/\s*,\s*|^\s*|\s*$/g, ',') // outputs two tailing commas

"c ".replace(/(\s*$)/g, ','); // outputs two tailing commas

function checkByIndexOf(commaStr, target) {
  const wrappedStr = `,${commaStr},`;
  const wrappedTarget = `,${target},`;
  return wrappedStr.indexOf(wrappedTarget) !== -1;
}

/**
 * High-performance check: indexOf + boundary validation (supports spaces/dots/no special chars)
 * @param {string} commaStr - Comma-separated string (may contain spaces, dots)
 * @param {string} target - Target item (may contain dots)
 * @returns {boolean} Whether the target is included as a standalone item
 */
function checkByIndexOfWithBoundary(commaStr, target) {
  const targetLen = target.length;
  const strLen = commaStr.length;
  let pos = commaStr.indexOf(target);

  // Return false immediately if target is not found
  if (pos === -1) return false;

  // Loop through all matching positions (avoid missing matches, e.g., duplicate items)
  while (pos !== -1) {
    // Check front boundary: start of string / previous char is comma/space
    const prevOk = pos === 0 || /[, ]/.test(commaStr[pos - 1]);
    // Check rear boundary: end of string / next char is comma/space
    const nextOk = (pos + targetLen) === strLen || /[, ]/.test(commaStr[pos + targetLen]);

    // Return true if both boundaries match (target is a standalone item)
    if (prevOk && nextOk) return true;

    // Find next matching position (avoid re-matching the same position)
    pos = commaStr.indexOf(target, pos + 1);
  }

  // All matching positions fail boundary validation
  return false;
}

/**
 * Check if a comma-separated string contains a specified standalone item
 * @param {string} commaStr - Original comma-separated string (e.g. "apple,banana,orange")
 * @param {string} target - Target string to check (e.g. "banana")
 * @returns {boolean} Whether the target item is included as a standalone entry
 */
function checkCommaStrInclude(commaStr, target) {
  // Escape regex special characters in the target string (e.g. . * + ? $ ^ [ ] ( ) { } | \ /)
  const escapedTarget = target.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  
  // Build regex pattern: match (start of string | comma) + escaped target + (comma | end of string)
  // Ensures the target is a standalone item (avoids partial matches)
  const regex = new RegExp(`(^|,)${escapedTarget}(,|$)`, 'g');
  
  // Test if the regex matches the comma-separated string
  return regex.test(commaStr);
}

Problem

The expected output is "a,b,c" (no trailing commas, normalized commas), but the current code produces two trailing commas instead. I don't understand why the regex is matching in a way that adds extra commas at the end.

What I've Tried

  • I checked the regex pattern /\s*,\s*|^\s*|\s*$/g and understand it's meant to match:
    • Whitespace around commas (\s*,\s*)
    • Leading whitespace (^\s*)
    • Trailing whitespace (\s*$)
  • I replaced all matches with ,, but the trailing space in the input seems to trigger two replacements that result in double commas.

Question

  1. Why does this regex produce two trailing commas for the input "a,b,c "?
  2. How can I adjust the regex (or use a better approach) to get the clean output "a,b,c" for comma-separated strings with extra whitespace/commas?
5
  • 2
    I would use \s+ instead of \s*, because it only makes sense to replace when you've matched at least one space. Commented 17 hours ago
  • 4
    It produces ,a,b,c,, and not a,b,c but thats probably the LLM's fault... Reminder that ChatGPT generated content is not allowed. Commented 17 hours ago
  • 2
    ",,,a, b , c , ,".split(/\s|,/).filter((part) => part.length).join() Commented 17 hours ago
  • 1
    I’m voting to close this question because I find it is👎 Commented 17 hours ago
  • Great question +1 ! Commented 8 hours ago

4 Answers 4

4

Why not use the built-in methods like split(), map(), trim(), filter(), and join()? Maybe they are not as fast, but the code is more readable.

const input = "a,b,c ";
const result = input.split(',')
                 .map(val => val.trim())
                 .filter(val => !!val)
                 .join(',');

console.log(result);

Sign up to request clarification or add additional context in comments.

2 Comments

The downside is that this would potentially remove commas, namely when an entry is empty, like in "a,,b. That is an effect that is not described/asked in the question.
I don't think it's a downside it would remove multiple comma's and white space. I do think it's a downside that it would remove newlines. Wouldn't be affective in multi-line text. And it might be more efficient if split could use regex [\s,]*,[\s,]* .
2

Why does this regex produce two trailing commas for the input "a,b,c "?

It does so because your regex has three different alternatives, and only the first one matches a comma. So only if it is the first pattern that matches, will the inserted comma replace the one that was matched, but when the match is with one of the two other patterns (either ^\s* or \s*$), then no comma is matched, and so the comma that is inserted is an extra comma that did not occur in the input.

Additionally, after the trailing spaces have matched, there is one more match with an empty string, which gives the second match that appends a comma to your output.

How can I adjust the regex?

One way to solve this, is to capture the comma in a capture group (using parentheses). Then reproduce in the replacement what was captured with $1. Now if the second or third pattern is matched, the capture group will be empty, and so you avoid inserting a comma when none occurred in the match:

const input = "a,b,c ";
const result = input.replace(/\s*(,)\s*|^\s+|\s+$/g, '$1');
console.log(result); // Outputs "a,b,c"

NB: I also replaced \s* with \s+ in the second and third pattern, as you don't need to replace an empty string.

Another way is to not match any comma, and not insert one either. For that you can use look-around assertions:

const input = "a,b,c ";
const result = input.replace(/\s+(?=,|$)|(?<=,|^)\s+/g, '');
console.log(result); // Outputs "a,b,c"

Comments

0

It is because your pattern matches preceding and trailing whitespaces with this pattern ^\s*|\s*$/g - what's more it will also match zero length strings due to * operator. That is why you get extra commas.

I would leave only \s*,\s* pattern, that matches actually something - comma, and optionally white spaces around.

To deal with spaces at the begining and at the end of the string, simply use trim():

const input = "a,b,c ";
const result = input.replace(/\s*,\s*/g, ',').trim();
console.log(result); // Outputs "a,b,c" without spaces at the start and at the end

Comments

0

As @trincot best put it, at your end of string term \s*$ are actually two constructs, one is optional.
It will first match the space then advance the current position by 1 past the space then match the end of string (again).
This second time match is because the space is optional.

To get to the point that you're trying to put your string in the form of <char> X <comma> , <char> Y
I think you will keep revisiting this until you settle on a regex that will put it in this form
by removing duplicate comma's, and no comma's at the beginning or end.

this requires that you capture a comma if you can, then replace with that capture.
However if there is not a comma captured there will be an empty string replacement.

This will do 2 things.

  1. Remove all commas and white space from the beginning and end.
  2. Replace all comma's in-between words with a single comma and any surrounding white space.

If you don't care about trimming newlines then use this [\s,]+ to trim.
If you want to keep newlines then use this (?:[^\S\r\n]|,)+ to trim.

In this regex order matters in the alternation. Do the BOL and EOL first
then the middle last.

Expanded Regex https://regex101.com/r/gRbwAf/1 Replace with $1

  ^ (?: [^\S\r\n] | , )+     # Check first
| (?: [^\S\r\n] | , )+ $     # Check second
| (?: [^\S\r\n] | , )* ( , ) (?: [^\S\r\n] | , )*   # (1) Write back comma, check last

const input =
  ",,,,a,b,,,c ,,, \n" +
  " ,a,,b,c  \n";
const result = input.replace(/^(?:[^\S\r\n]|,)+|(?:[^\S\r\n]|,)+$|(?:[^\S\r\n]|,)*(,)(?:[^\S\r\n]|,)*/mg, "$1");
console.log(result);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.