-2

I got a byte[] array from an input file (ASCII text file) and I'm trying to delete all bytes of a pre-set hex value, without converting the array to string and without using lists.

This is the code I came up with so far:

byte[] byteLine = File.ReadAllBytes(filePath);
byte lineBreak = (byte)0x0d;
int breakIndex = Array.IndexOf(byteLine, lineBreak);

Those are the variables I thought I'd need for this to work. Here I found a method that does something similar to what I was trying to do so I added the code (as is) to the project and tried using it in a loop. First a for loop:

for(int i = 0; i < byteLine.Length; i++) {

    byte v = byteLine[i];
    if (v == lineBreak) {
    
        RemoveRange(byteLine, breakIndex, 2);
        
    }
}

I was trying to go through every byte in the array until one equivalent to the line break byte appears, then I tried calling the method that starting at the line break index (breakIndex var) deletes 2 byes (because line breaks in ASCII use 2 byes 0d0a). It was supposed to cycle through all array and repeat this operation for all the line breaks (lineBreak var). That's what I thought it would do if I used a for(;;) loop but it didn't, I must've made mistakes. Second try using foreach:

int t = breakIndex;
foreach (byte b in byteLine) {
if (b == lineBreak) {
        while (t != 0)
        {
            RemoveRange(byteLine, breakIndex, 2);
            t--;
        }
    }        
}

In the second loop I added a variable (t) to use as a "counter" which starts equal as the line break index value (breakIndex var). I know the input file is a square ASCII image so it got the same number of lines as the number of line breaks symbols (minus one line break because last char of last line isn't a break), so the loop should've cycled for a number of times equal to the number of line breaks, or until t reached the value 0 since it goes down by 1 after every loop. Same issue in this loop, the method didn't trigger. This is the third and last thing I tried:

int t = 0;
foreach (byte b in byteLine) {
    if (b == lineBreak) { do
        {
            RemoveRange(byteLine, breakIndex, 2);
            t++;
        }
        while (t < breakIndex); }
    }
}

Similar to the other foreach loop but it goes by ascension rather than descension, it should've continued cycling until the t variable reached the value of the total number of line breaks. I don't even know if I'm using do and while correctly, it's my first time trying to use do. This said, I tested and method outside the loop and it works perfectly, it just deletes the first line break since it wasn't meant to loop on its own. After some searching I found this answer.

From reading that answer I understand that there should be a way to directly edit items in a byte[] array without many workarounds, the issue is that I don't know how to apply that solution to my problem since I'm not familiar with those C# functions.

11
  • 2
    Why don't you want to use a List? It would be easy to read the file byte for byte and only add the desired ones to a List. Commented Dec 7, 2024 at 13:05
  • Hi Bill, that's what RemoveRange() does more or less without cycling, I'm not an expert but I think this slows down the execution. I'm only using that temporarily until there's a better solution or until I manage to correctly edit and implement the loop from the other user's answer. I just don't know C# well enough yet. Commented Dec 7, 2024 at 13:12
  • If you don't want to use List<byte> can you still use LINQ? Because, with LINQ, you can do byteLine.Where(b => b != lineBreak).ToArray() and get a new array with the lineBreak byte removed. Or are you trying to remove lineBreak from the array in-place and return a Span<byte> with the byte removed? Commented Dec 7, 2024 at 14:13
  • @dbc I was trying that 2 days ago but something wasn't working so I gave up... yet today it works perfectly alone without using loops. It works with 1 byte though, do you know if it's possible to make it work for 2 bytes at once? I imagine first of all I should modify the variable or add a second one. If you want you can write an answer, if some other useful tricks that you think can be useful for this problem come to mind, feel free to add them. I'm waiting a day to see what ideas other users have. Commented Dec 7, 2024 at 14:29
  • Note that arrays have a fixed length. So, if you want to modify the existing array, you will have to move the bytes around and fill the extra space at the end with e.g. zeroes. If you want a "clean" array, you must create a new one with a shorter length. The problem is that you don't know its length in advance. You could do two passes: one to count the bytes to be removed and one to fill a new array created with this count. Is it worth the effort? It will probably not be much faster than the other simpler alternatives people suggested in their comments. Commented Dec 7, 2024 at 14:33

1 Answer 1

0

Here is an implementation processing the array in-place.

const byte CR = 0x0d, LF = 0x0a;

byte[] byteLine = [1, 2, 3, 4, CR, LF, 5, 6, 7, CR, LF, CR, LF, 8, 9];

int firstCR = Array.IndexOf(byteLine, CR);
if (firstCR >= 0) {
    // Jump to first CR
    int destination = firstCR, source = firstCR + 2;
    while (source < byteLine.Length) {
        if (byteLine[source] == CR) {
            source += 2;
        } else {
            byteLine[destination++] = byteLine[source++];
        }
    }
    // Fill remaining bytes with 0
    for (int i = destination; i < byteLine.Length; i++) {
        byteLine[i] = 0;
    }
}

But note that it is far away from being a one-liner and it is more likely to contain an error than a one-liner like byteLine.Where(b => b != lineBreak).ToArray().

Also it does not make sense to make optimizations like this without doing benchmarks. Solutions which you think might be slow are sometimes faster than expected. Microsoft puts a lot of effort in speeding up loops and LINQ queries. In some cases, LINQ uses vectorization, which can produce faster results than a supposedly faster for-loop.

See also:

Sign up to request clarification or add additional context in comments.

3 Comments

You're looping every 2 bytes using source += 2;? did I get it right? I never thought about that and firstCR + 2; is because the index starts at zero I suppose. I'm trying to mentally translate what I am reading, please don't mind it. It'll probably take me a lot of practice to be able to pull this kind of loop by memory but I see where it goes. It's definitely more functional than Where() and I was studying how to use loops for later phases of the program but I couldn't find any examples used with bytes, this can make for a good reference to get me started.
I increment only by 2 if the byte is 0x0D (carriage return character) in the if-part, because I am assuming that the next one will be a 0x0a (line feed character). Otherwise I increment by one with the ++ post increment operator in the else-part. Debug this code step-by-step and you will see what it does.
Yes I read it slooowly but I understood everything about how it operates.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.