8

I tried running the code in a keras blog post.

The code writes to a .npy file as follows:

bottleneck_features_train = model.predict_generator(generator, nb_train_samples // batch_size)
np.save(open('bottleneck_features_train.npy', 'w'),bottleneck_features_train)

It then reads from this file:

def train_top_model():
    train_data = np.load(open('bottleneck_features_train.npy'))

Now I get an error saying:

Found 2000 images belonging to 2 classes.
Traceback (most recent call last):
  File "kerasbottleneck.py", line 103, in <module>
    save_bottlebeck_features()
  File "kerasbottleneck.py", line 69, in save_bottlebeck_features
    np.save(open('bottleneck_features_train.npy', 'w'),bottleneck_features_train)
  File "/opt/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 511, in save
    pickle_kwargs=pickle_kwargs)
  File "/opt/anaconda3/lib/python3.6/site-packages/numpy/lib/format.py", line 565, in write_array
version)
  File "/opt/anaconda3/lib/python3.6/site-packages/numpy/lib/format.py", line 335, in _write_array_header
fp.write(header_prefix)
TypeError: write() argument must be str, not bytes

After this, I tried changing the file mode from 'w' to 'wb'. This resulted in an error while reading the file:

Found 2000 images belonging to 2 classes.
Found 800 images belonging to 2 classes.
Traceback (most recent call last):
  File "kerasbottleneck.py", line 104, in <module>
    train_top_model()
  File "kerasbottleneck.py", line 82, in train_top_model
    train_data = np.load(open('bottleneck_features_train.npy'))
  File "/opt/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 404, in load
magic = fid.read(N)
  File "/opt/anaconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

How can I fix this error?

3
  • In the second code snippet brackets are not closed Commented May 26, 2018 at 12:19
  • Sorry, just forgot to copy that. I've edited the question now. Commented May 26, 2018 at 12:21
  • It is still not correct I have edited it now. Commented May 26, 2018 at 12:43

1 Answer 1

15

The code in the blog post is aimed at Python 2, where writing to and reading from a file works with bytestrings. In Python 3, you need to open the file in binary mode, both for writing and then reading again:

np.save(
    open('bottleneck_features_train.npy', 'wb'),
    bottleneck_features_train)

And when reading:

train_data = np.load(open('bottleneck_features_train.npy', 'rb'))

Note the b character in the mode arguments there.

I'd use the file as a context manager to ensure it is cleanly closed:

with open('bottleneck_features_train.npy', 'wb') as features_train_file
    np.save(features_train_file, bottleneck_features_train)

and

with open('bottleneck_features_train.npy', 'wb') as features_train_file:
    train_data = np.load(features_train_file)

The code in the blog post should use both of these changes anyway, because in Python 2, without the b flag in the mode text files have platform-specific newline conventions translated, and on Windows certain characters in the stream will have specific meaning (including causing the file to appear shorter than it really is if a EOF characte appears). With binary data that could be a real problem.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.