0

Hello i have a small python script its job is to read data from a TXT file and sort it in a specific remove duplicate and remove none sense data and put it back in another TXT file this format MAC IP Number Device

import re
f = open('frame.txt', 'r')
d = open('Result1.txt', 'w')
mac=""
ip=""
phoneName=""
phoneTel=""
lmac=""
lip=""
lphoneName=""
lphoneTel=""
lines=f.readlines()
s=0
p=0
for line in lines:
    matchObj = re.search( '(?<=Src: )[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}:[0-9a-z]{2}', line, re.M|re.I)
    if(matchObj):
            mac=matchObj.group(0)+"\t"
    matchObj = re.search( '(?<=Src: )([0-9]+)(?:\.[0-9]+){3}', line, re.M|re.I)
    if(matchObj):
            ip=matchObj.group(0)+"\t"
    if(s==1):
        s=0
        matchObj = re.search( '(?<=Value: )\d+',line,re.M|re.I)
        if(matchObj):
            phoneName=matchObj.group(0)+"\t"
    if(p==1):
        p=0
        matchObj = re.search( '(?<=Value: ).+',line,re.M|re.I)
        if(matchObj):
            phoneTel=matchObj.group(0)+"\t"  
    matchObj = re.search( '(?<=Key: user \(218)', line, re.M|re.I)
    if(matchObj):
        s=1
    matchObj = re.search( '(?<=Key: resource \(165)', line, re.M|re.I)
    if(matchObj):
        p=1
    if(mac!="" and ip!="" and phoneName!="" and phoneTel!="" and mac!=lmac and ip!=lip and phoneName!=lphoneName and phoneTel!=lphoneTel):        
        d.write(mac+" " +ip+" "+ phoneName+" "+ phoneTel)
        lmac=mac
        lip=ip
        lphoneName=phoneName
        lphoneTel=phoneTel
        d.write("\n")
    matchObj = re.search( 'Frame \d+', line, re.M|re.I)
    if(matchObj):              
        mac=""
        ip=""
        phoneName=""
        phoneTel=""        
d.close()
f.close()

here the code the problem is that the code need to process HUGE amount of data that might reach 100GB and when I'm doing so the program freezes and kill itself any idea how to fix this problem thank you so much!

1
  • Sounds like a job for awk. What does your file look like? Commented Jul 18, 2013 at 23:46

2 Answers 2

5

You're reading the entire file at the start - if the file is that large, loading it into memory will be a problem. Try iterating over the lines instead. Generally, you'd like

with open(filename) as f:
    for line in f:
        # This will iterate over the lines in the file rather than read them all at once

So for you, change your loop construction to:

for line in f:

and remove:

lines=f.readlines()
Sign up to request clarification or add additional context in comments.

Comments

1

use readline() instead of readlines()

readlines() reads whole file to memory at once.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.