0

I am working on the following problem. Lets say I have data (say image values RGB as integers) in a file per line. I want to read 10000 of these lines and make a frame object (image frame containing 10000 RGB Values) and send it to downstream function in the processing pipeline. Then read the next 10000 lines and make another frame object and send it to downstream function in the pipeline.

How can i setup this function that it keeps on making frame objects until the end of file is reached. Is the following the right way to do it? Are there other neat approaches?

class frame_object(object):
    def __init__(self):
            self.line_cnt  = 0
            self.buffer = []

    def make_frame(line):
        if(self.line_cnt < 9999):
            self.buffer.append(line)
        return self.buffer
0

1 Answer 1

0

You could use generators to create a data pipeline like in the following example:

FRAME_SIZE = 10000


def gen_lines(filename):
    with open(filename, "r") as fp:
        for line in fp:
            yield line[:-1]


def gen_frames(lines):
    count = 0
    frame = []

    for line in lines:
        if count < FRAME_SIZE:
            frame.append(line)
            count += 1

        if count == FRAME_SIZE:
            yield frame
            frame = []
            count = 0

    if count > 0:
        yield frame


def process_frames(frames):
    for frame in frames:
        # do stuff with frame
        print len(frame)


lines = gen_lines("/path/to/input.file")
frames = gen_frames(lines)
process_frames(frames)

In this way it's easier to see the data pipeline and hook in different processing or filtering logic. You can learn more on generators and their use in data-processing pipelines here.

Sign up to request clarification or add additional context in comments.

3 Comments

Just really wonder, aren't such problems are solved with message ques..?
It really depends on the context of what you're trying to do. If you're processing a locally accessible file and the output is another file (or a set of files), then I think it would be overkill to use a message queue (like RabbitMQ). Also you probably don't want a lot of data to be in the queue because it could cause memory issues in your message broker.
if it's just a single file processing yeap this would be overkill :) I've ended up putting whatever data I have to sqlite (or mongo) for processing, because it's just way more productive than doing everything by hand.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.