Skip to content

Improper handling of BitsPerComponent for FlateDecode/ICCBased images #3534

@mbierma

Description

@mbierma

When extracting FlateDecoded grayscale images with one bit per component (/BitsPerComponent 1), the handle_flate function incorrectly determines the image mode. This results in a ValueError: not enough image data when the data is passed to Pillow's Image.frombytes.

Although _get_mode_and_invert_color is called before handle_flate and correctly handles /BitsPerComponent, the resulting mode is overwritten inside handle_flate (around L283) by the potentially incorrect result from _get_imagemode.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
macOS-26.0.1-x86_64-i386-64bit

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.4.0, crypt_provider=('cryptography', '44.0.1'), PIL=12.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

reader = PdfReader("pypdf_bug_3534_iccbased.pdf")
page = reader.pages[0]

for image in page.images:
    img = image.image  # ValueError: not enough image data

pypdf_bug_3534_iccbased.pdf (this file can be added to tests)

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "pypdf/_page.py", line 473, in __iter__
    yield self[i]
  File "pypdf/_page.py", line 469, in __getitem__
    return self.get_function(lst[index])
  File "pypdf/_page.py", line 654, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
  File "pypdf/filters.py", line 891, in _xobj_to_image
    img, image_format, extension, _ = _handle_flate(
  File "pypdf/_xobj_image_helpers.py", line 285, in _handle_flate
    img = Image.frombytes(mode2, size, data)  # reloaded as mode may have changed
  File "site-packages/PIL/Image.py", line 3144, in frombytes
    im.frombytes(data, decoder_name, decoder_args)
  File "site-packages/PIL/Image.py", line 868, in frombytes
ValueError: not enough image data

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-imagesFrom a users perspective, image handling is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions