regexp
#1
Scooby Regular
Thread Starter
Join Date: Jan 2002
Posts: 11,581
Likes: 0
Received 0 Likes
on
0 Posts
regexp
Can anbody help, I just can't seem to get my head around regular expressions. I have about 20Gb of logfiles. I want to create a single logfile with only specific information in it. All the logfiles are from one day, so I don't need to search on date, but I want to search between 2 times and for a particular website.
The time format is hh:mm:ss and assume the web server is 10.0.0.1
The 2 times are 01:00:00 and 02:00:00
I assume I'd simply do something along the lines of grep [regexp] > file.log
Any help appreciated. My poor brain hurts trying to understand escape characters and modifiers
The time format is hh:mm:ss and assume the web server is 10.0.0.1
The 2 times are 01:00:00 and 02:00:00
I assume I'd simply do something along the lines of grep [regexp] > file.log
Any help appreciated. My poor brain hurts trying to understand escape characters and modifiers
Last edited by NotoriousREV; 01 August 2005 at 10:32 AM.
#2
Scooby Regular
Originally Posted by NotoriousREV
The time format is hh:mm:ss and assume the web address is www.foobar.com (which would include sub-directories). The 2 times are 01:00:00 and 02:00:00
I assume I'd simply do something along the lines of grep [regexp] > file.log
I assume I'd simply do something along the lines of grep [regexp] > file.log
Assuming the time would be at index 0 in the logfile if I split on whitespace, something like this would do it, on a unix-like system, with perl installed:
$ perl -nle '@s = split(/\s+/, $_); $s[0] =~ tr/://; $s[0] >= 10000 && $s[0] <= 20000 && print' yourlogfile > newlogfile
#3
Scooby Regular
Thread Starter
Join Date: Jan 2002
Posts: 11,581
Likes: 0
Received 0 Likes
on
0 Posts
Thanks steven. I couldn't get it to work so I ended up using a far less elegant method which worked well enough! I think I'll be doing some serious regexp swotting 'cos it would be very handy if I could get this to work.
#4
Scooby Regular
You won't be able to do it (with any kind of efficiency) with a regular expression because you need to bounds-check the time. Doing something like 0[12]:\d{2}:\d{2} will allow 02:34:12 for example, so you'd end up having logic embedded within the regex checking that $2 and $3 don't go over 0 if $1 is 2, etc.