Processing huge amount of data on a small python script

Question

Hello i have a small python script its job is to read data from a TXT file and sort it in a specific remove duplicate and remove none sense data and put it back in another TXT file this format MAC IP Number Device

import re
f = open('frame.txt', 'r')
d = open('Result1.txt', 'w')
mac=""
ip=""
phoneName=""
phoneTel=""
lmac=""
lip=""
lphoneName=""
lphoneTel=""
lines=f.readlines()
s=0
p=0
for line in lines:
    matchObj = re.search( '(?<=Src: )[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}', line, re.M|re.I)
    if(matchObj):
            mac=matchObj.group(0)+"\t"
    matchObj = re.search( '(?<=Src: )([0-9]+)(?:\.[0-9]+){3}', line, re.M|re.I)
    if(matchObj):
            ip=matchObj.group(0)+"\t"
    if(s==1):
        s=0
        matchObj = re.search( '(?<=Value: )\d+',line,re.M|re.I)
        if(matchObj):
            phoneName=matchObj.group(0)+"\t"
    if(p==1):
        p=0
        matchObj = re.search( '(?<=Value: ).+',line,re.M|re.I)
        if(matchObj):
            phoneTel=matchObj.group(0)+"\t"  
    matchObj = re.search( '(?<=Key: user \(218)', line, re.M|re.I)
    if(matchObj):
        s=1
    matchObj = re.search( '(?<=Key: resource \(165)', line, re.M|re.I)
    if(matchObj):
        p=1
    if(mac!="" and ip!="" and phoneName!="" and phoneTel!="" and mac!=lmac and ip!=lip and phoneName!=lphoneName and phoneTel!=lphoneTel):        
        d.write(mac+" " +ip+" "+ phoneName+" "+ phoneTel)
        lmac=mac
        lip=ip
        lphoneName=phoneName
        lphoneTel=phoneTel
        d.write("\n")
    matchObj = re.search( 'Frame \d+', line, re.M|re.I)
    if(matchObj):              
        mac=""
        ip=""
        phoneName=""
        phoneTel=""        
d.close()
f.close()

here the code the problem is that the code need to process HUGE amount of data that might reach 100GB and when I'm doing so the program freezes and kill itself any idea how to fix this problem thank you so much!

Sounds like a job for awk. What does your file look like?

Blender
– Blender

2013-07-18 23:46:18 +00:00
Commented Jul 18, 2013 at 23:46 — Blender
– Blender, Commented Jul 18, 2013 at 23:46

Sajjan Singh · Accepted Answer · 2013-07-18 23:44:11Z

5

You're reading the entire file at the start - if the file is that large, loading it into memory will be a problem. Try iterating over the lines instead. Generally, you'd like

with open(filename) as f:
    for line in f:
        # This will iterate over the lines in the file rather than read them all at once

So for you, change your loop construction to:

for line in f:

and remove:

lines=f.readlines()

answered Jul 18, 2013 at 23:44

Sajjan Singh

2,5612 gold badges28 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user2547836 · Accepted Answer · 2013-07-18 23:47:40Z

1

use readline() instead of readlines()

readlines() reads whole file to memory at once.

answered Jul 18, 2013 at 23:47

user2547836

892 silver badges8 bronze badges

Collectives™ on Stack Overflow

Processing huge amount of data on a small python script

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related