I'm having some problems decoding what should be simple data.
I have a base64 string that represents a np.int64 followed by an array of np.float64. The size of the array is defined by the first np.int64. This pattern is then repeated for multiple arrays. So in order to decode all of the arrays, I need to be able to read the size in bytes to find the starting point of the next pair.
Here is a very simple example showing the first pair. The second pair starts straight after this - after 64 bytes or 88 base64 characters. then rinse and repeat for the remainig arrays.
>>> test_data = 'OAAAAAAAAAAAAAAAAAAAAFVVVVVVVcU/VVVVVVVV1T8AAAAAAADgP1VVVVVVVeU/qqqqqqqq6j8AAAAAAADwPw=='
>>> struct.unpack('Qddddddd', base64.b64decode(test_data)) # 'Q7d' also works
(56,
0.0,
0.16666666666666666,
0.3333333333333333,
0.5,
0.6666666666666666,
0.8333333333333333,
1.0)
My problem is that I need to extract the Int64 first to know the proper size array to be unpacked and the start of the next array which starts immediately after this.
I thought I could simply cut off the first 8 bytes from the base64 string using the 4/3 size relation and round to the nearest 4 to account for padding like so:
struct.unpack('Q', base64.b64decode(test_data[:12]))
But that always throws an error regardelsss of how big my slice is (I've tried 8 to 16 just to try and figure out what is going on):
struct.error: unpack requires a buffer of 8 bytes
There must be a simple way to extract just that first integer without knowing the length of the array it is describing?
b = base64.b64decode(test_data); struct.unpack_from('Q', b)[0]-->56. There's also struct.calcsize if you want to get the offset into the bytes data. So you could also dostruct.unpack('Q', b[:struct.calcsize('Q')])[0](which is probably roughly equivalent to what the previous solution does).unpack_from. Or you could just use almost your own codestruct.unpack('Q', base64.b64decode(test_data[:12])[:8])unpack_fromis the magic I was looking for and didn't read carefully enough to see.