1

I'm having some problems decoding what should be simple data.

I have a base64 string that represents a np.int64 followed by an array of np.float64. The size of the array is defined by the first np.int64. This pattern is then repeated for multiple arrays. So in order to decode all of the arrays, I need to be able to read the size in bytes to find the starting point of the next pair.

Here is a very simple example showing the first pair. The second pair starts straight after this - after 64 bytes or 88 base64 characters. then rinse and repeat for the remainig arrays.

>>> test_data = 'OAAAAAAAAAAAAAAAAAAAAFVVVVVVVcU/VVVVVVVV1T8AAAAAAADgP1VVVVVVVeU/qqqqqqqq6j8AAAAAAADwPw=='
>>> struct.unpack('Qddddddd', base64.b64decode(test_data)) # 'Q7d' also works
(56,
 0.0,
 0.16666666666666666,
 0.3333333333333333,
 0.5,
 0.6666666666666666,
 0.8333333333333333,
 1.0)

My problem is that I need to extract the Int64 first to know the proper size array to be unpacked and the start of the next array which starts immediately after this.

I thought I could simply cut off the first 8 bytes from the base64 string using the 4/3 size relation and round to the nearest 4 to account for padding like so:

struct.unpack('Q', base64.b64decode(test_data[:12]))

But that always throws an error regardelsss of how big my slice is (I've tried 8 to 16 just to try and figure out what is going on):

struct.error: unpack requires a buffer of 8 bytes

There must be a simple way to extract just that first integer without knowing the length of the array it is describing?

7
  • @mkrieger1 well it will just be an integer followed by a certain number of other values (could be int or float) that will be used to form a numpy array Commented Mar 16 at 21:03
  • 4
    @jpmorr Use struct.unpack_from: e.g. b = base64.b64decode(test_data); struct.unpack_from('Q', b)[0] --> 56. There's also struct.calcsize if you want to get the offset into the bytes data. So you could also do struct.unpack('Q', b[:struct.calcsize('Q')])[0] (which is probably roughly equivalent to what the previous solution does). Commented Mar 16 at 21:29
  • Then you have 9 bytes. So, indeed, either you use unpack_from. Or you could just use almost your own code struct.unpack('Q', base64.b64decode(test_data[:12])[:8]) Commented Mar 17 at 8:30
  • @ekhumoro Thanks. unpack_from is the magic I was looking for and didn't read carefully enough to see. Commented Mar 17 at 9:36
  • @chrslg That's what I was currently doing: reading extra data and then slicing out the first 8 bytes, but I thought that wasn't the best way to achieve what I needed. Commented Mar 17 at 9:38

1 Answer 1

1

You need to first decode the base64 string to retrieve the original binary data. This approach simplifies data manipulation, as each character in a Base64 string represents 6 bits (so it's complicated to select a byte). Once decoded, you can easily unpack the binary data. Here is a solution that does that for multiple arrays.

import base64, struct

test_data = 'OAAAAAAAAAAAAAAAAAAAAFVVVVVVVcU/VVVVVVVV1T8AAAAAAADgP1VVVVVVVeU/qqqqqqqq6j8AAAAAAADwPw=='

decoded_data = base64.b64decode(test_data)

index = 0
while (index < len(decoded_data)):
    array_size = struct.unpack('Q', decoded_data[index : (index + 8)])[0]
    data = struct.unpack('d' * (array_size // 8), decoded_data[(index + 8) : (array_size + 8)])
    index += array_size + 8
    print(f'array size: {array_size // 8}')
    print(f'array data: {data}')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.