-2

I have a bunch of log files which should be parsed and some info from them - extracted. A sample line (line that unfortunately, after trimming sensitive data looks like xml):

<SerialNumber>xxxxxxxxx</SerialNumber><IP>X.X.X.X</IP><UserID>[email protected]</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T02:42:59</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion>

I want to get ip ( in ip tags), and usermail (between userid tags)

My current "solver"

$regex = "<UserID>"

$files = Get-ChildItem -path 'c:\path\*.log'
foreach ($infile in $files) {
$res = select-string -Path $infile -Pattern $regex -AllMatches  {
$txt = $res[$res.count-1]

# get user
$pos1= $txt.line.IndexOf("<UserID>")
$pos2= $txt.line.IndexOf("</UserID>")
$Puser = $txt.Line.Substring($pos1+8,$pos2-$pos1-8)

....
}

it works, but I wonder if different approach will be better, want see how this could be done with select-string -pattern ...

Tried several "GUI" regex builders, but I can't figure how to select whats needed Thanks

PS:

Result after

$regex = '<IP>(.*)</IP>'
$res = select-string -Path $infile -Pattern $regex
$res

0312092535|cfg  |4|00|DevUpdt|[LyncDeviceUpdateC::prepareAndSendRequest] '<?xml version="1.0" encoding="utf-8"?><Request><DeviceType>3PIP</DeviceType><MacAddress>11-11-11-11-11-11</MacAddress><SerialNumber>111111111111</SerialNumber><IP>10.1.1.1</IP><UserID>[email protected]</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T09:25:35</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion><Major>5</Major><M

Sample of log file (100Kb+)

0312104211|nisvc|2|00|Invoker's nCommands,CurrentKey:2,(106)Responder
0312104211|nisvc|2|00|Response(-1)nisvc,(-1),(-1)app,(22),(Expiry,TransactionId,Time,Type):(-1,-1,1520844131,1)IndicationCode:(400)
0312104211|app1 |5|00|[CWPADServiceEwsRsp::execute] PAC file failed with ''
0312104301|cfg  |4|00|DevUpdt|[LyncDeviceUpdateC::prepareAndSendRequest] '<?xml version="1.0" encoding="utf-8"?><Request><DeviceType>3PIP</DeviceType><MacAddress>11-11-11-11-11-11</MacAddress><SerialNumber>64167F2A8451</SerialNumber><IP>10.1.1.1</IP><UserID>[email protected]</UserID><NumOfFiles>1</NumOfFiles><LocaleID>ENU</LocaleID><Vendor>POLYCOM</Vendor><Model>VVX311</Model><Revision>Rev-A</Revision><CurrentTime>2018-03-12T10:43:00</CurrentTime><CurrentModule><FileName>cpe.nbt</FileName><FileVersion><Major>5</Major><Minor>
0312104301|nisvc|2|00|Request(-1)nisvc,(701)NIServiceHttpReqMsgKey,(-1)proxy,(1001)AuthRsp,(Expiry,TransactionId,Time,Type):(45000,1306758696,1520844181,0)IndicationLevel:(200)
5
  • Why are you parsing xml.... Commented Mar 12, 2018 at 13:23
  • Duplicate of Parsing xml using powershell Commented Mar 12, 2018 at 13:23
  • 1
    If the XML is all in one line, the Regex should be pretty straighforward, something like <IP>(.*)</IP>.*<UserID>(.*)<UserID>. It is far more easy using XML existing classes to get info from XML data ;-) Commented Mar 12, 2018 at 13:44
  • The file is NOT xml file, its a log file with "xml-like" lines here and there. Sample line means just a sample - of thousands lines in the log file I pasted the one line that contains the info. And question is how to extract (if possible) values using Select-string -Pattern As i already wrote - I did solve the task in a different way Commented Mar 12, 2018 at 13:53
  • Updated 1st post: adding full sample of log file and current result Commented Mar 13, 2018 at 6:40

1 Answer 1

1

This code will get all the files, read each file line by line and create objects with a user and ip and put them in an array.

[regex]$ipUserReg = '(?<=<IP>)(.*)(?:<\/IP><UserID>)(.*)(?=<\/UserID>)'
$files = Get-ChildItem $path -filter *.log
$users = @(
    foreach ($fileToSearch in $files) {
        $file = [System.IO.File]::OpenText($fileToSearch)
        while (!$file.EndOfStream) {
            $text = $file.ReadLine()
            if ($ipUserReg.Matches($text).Success -or $userReg.Matches($text).Success) {
                New-Object psobject -Property @{
                    IP = $ipUserReg.Matches($text).Groups[1].Value
                    User = $ipUserReg.Matches($text).Groups[2].Value
                }
            }
        }
        $file.Close()
})

To build out my regex, I often use regexr.com, but keep in mind powershell is slightly different when it comes to certain regex.

Edit: Here is an example using select-string rather than reading line by line:

[regex]$ipUserReg = '(?<=<IP>)(.*)(?:<\/IP><UserID>)(.*)(?=<\/UserID>)'
$files = Get-ChildItem $path -filter *.log
$users = @(
    foreach ($fileToSearch in $files) {
        Select-String -Path $fileToSearch.FullName -Pattern $ipUserReg -AllMatches | ForEach-Object {
            $_.Matches | ForEach-Object{
                New-Object psobject -property @{
                    IP = $_.Groups[1].Value
                    User = $_.Groups[2].Value
                }
            }
        }
    }
)
Sign up to request clarification or add additional context in comments.

5 Comments

Am I right that everywhere in text the regex variable is $ipUserReg? Here and there its used as $ipReg or $userReg 0-o Works fine, 1 problem - this way the program runs almost 10x slower :O regex usage execution time 24.0324853 Total files: 0 Total bytes: 59713119 "old style" execution time 2.0694602 Total files: 239 Total bytes: 59713119 "Old style" refers to using $txt.line.IndexOf to find positions of <ip> and </ip> the extract with $txt.Line.Substring, then same for the user
Damn, I can't format my answer :( or editor has problems with Chrome?! Either way, it works 10x slower than using .IndexOf to find substring position and then Substring to extract Ip/User :( also can't mark your answer as correct :S
@ChavdarChavdarov Added another example with select-string
@ChavdarChavdarov Make sure to mark the answer to help others that find this thread
Thanks, finally found where is "mark as answer" :D I think the bulk of speed difference was due to reading line by line :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.