Notices
Computer & Technology Related Post here for help and discussion of computing and related technology. Internet, TVs, phones, consoles, computers, tablets and any other gadgets.

Anyone good at "regular expressions"?

Thread Tools
 
Search this Thread
 
Old 01 April 2004, 09:42 AM
  #1  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Default Anyone good at "regular expressions"?

If so...

how do you match a whole word from a string.

eg. word = "test"

String = "test, test testing....test;"

I need it to match all of those apart from the "test" part of "testing".

Thanks in advance

Simon
Old 01 April 2004, 09:51 AM
  #2  
JV
Scooby Regular
 
JV's Avatar
 
Join Date: Jul 2002
Location: West Sussex
Posts: 271
Likes: 0
Received 0 Likes on 0 Posts
Default

Depends what you're using, but in Perl it's /\btest\b/

(\b for word boundary)

HTH,
James
Old 01 April 2004, 10:02 AM
  #3  
Fosters
Scooby Regular
 
Fosters's Avatar
 
Join Date: Jul 2000
Location: Islington
Posts: 2,145
Likes: 0
Received 0 Likes on 0 Posts
Default

what language Simon?
Old 01 April 2004, 10:02 AM
  #4  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Default

JV

thank you very much. It's actually in PHP.. don't suppose you know if this will use the same format do you? Apologies, I am useless at this!

Cheers

Simon
Old 01 April 2004, 10:05 AM
  #5  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Default

the actual requirement is to replace the word with another word wherever it appears... but only the full word.

Cheers
Simon
Old 01 April 2004, 10:19 AM
  #6  
JV
Scooby Regular
 
JV's Avatar
 
Join Date: Jul 2002
Location: West Sussex
Posts: 271
Likes: 0
Received 0 Likes on 0 Posts
Default

Not sure about PHP, but this might help:

http://uk2.php.net/manual/en/function.preg-replace.php
Old 01 April 2004, 10:30 AM
  #7  
RallyMarshal
Scooby Regular
 
RallyMarshal's Avatar
 
Join Date: Aug 2002
Posts: 703
Likes: 0
Received 0 Likes on 0 Posts
Default

Then cant you do a simple replace function that looks for [space]WORD[space] rather than a regular expression that might strip out content from within another string?
Old 01 April 2004, 10:34 AM
  #8  
stevencotton
Scooby Regular
 
stevencotton's Avatar
 
Join Date: Jan 2001
Location: behind twin turbos
Posts: 2,710
Likes: 0
Received 1 Like on 1 Post
Default

That's what \b does. You can't look for /\stest\s/ since that won't change something like "test, ".
Old 01 April 2004, 10:41 AM
  #9  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Default

JV

Perfect! Thank you very much, that's done the job

All the best

Simon
Old 01 April 2004, 10:52 AM
  #10  
icantthinkofone
Scooby Regular
 
icantthinkofone's Avatar
 
Join Date: Dec 2003
Posts: 97
Likes: 0
Received 0 Likes on 0 Posts
Default

I think the 'regexp' (purist!??) way would be to use a character class eg {., }word{., } which will match any of the supplied options (rather than the /b); regexp being language independant.

That said, if php supplies a /b style option, use it, it's a lot easier the generating a complex regexp!
Old 01 April 2004, 10:57 AM
  #11  
stevencotton
Scooby Regular
 
stevencotton's Avatar
 
Join Date: Jan 2001
Location: behind twin turbos
Posts: 2,710
Likes: 0
Received 1 Like on 1 Post
Default

Character classes are done with square brackets rather than curly braces, which are used for matching repetition. The regex purist will use what's available in the implementation, rather than sticking to one way of doing things If the PHP regular expression parser is POSIX compliant it should be ok.
Old 01 April 2004, 01:27 PM
  #12  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Default

OK..

it gets more complicated....

I'm using the \b thing which works perfectly... thank you...

But.. I want it to ignore the text if it is within certain specific tags...

best example would be in html...

testing test <a href="http://www.test.com">test</a> test <img src="http://www.test.com/image.jpg">

in this example, I would want it to ignore the two "test"s between "<a href" and "</a>" and also ignore the test inbetween "<img" and ">"

is this possible, or does it get way too complex?

Cheers

Simon
Old 01 April 2004, 01:35 PM
  #13  
Fosters
Scooby Regular
 
Fosters's Avatar
 
Join Date: Jul 2000
Location: Islington
Posts: 2,145
Likes: 0
Received 0 Likes on 0 Posts
Default

fussy b@stard!

Old 01 April 2004, 01:37 PM
  #14  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Default

LOL
Old 01 April 2004, 01:43 PM
  #15  
Fosters
Scooby Regular
 
Fosters's Avatar
 
Join Date: Jul 2000
Location: Islington
Posts: 2,145
Likes: 0
Received 0 Likes on 0 Posts
Default

how about a routine that scans the page and replaces everything between "<a href" and "/a>" with spaces. same with "<img" tags and then do your /b thingie?

kinda like a mid(rah, rah, rah)=space$(1) in vb
Old 01 April 2004, 01:50 PM
  #16  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Default

Was trying to use a regular expression to avoid parsing all the text each time for performance.

If it can't be done in one regular expression, I guess I'll have to run two processes on it.


Cheers
Simon
Old 01 April 2004, 02:18 PM
  #17  
stevencotton
Scooby Regular
 
stevencotton's Avatar
 
Join Date: Jan 2001
Location: behind twin turbos
Posts: 2,710
Likes: 0
Received 1 Like on 1 Post
Default

Regex matching is anything but efficient so you're already imposing a performance hit using them, even if you precompile it. You can't do what you want with one regular expression, you'd be better off with a mixture of some kind of SAX parser (so you can ignore all the HTML tags) and regex the substitution.

Hmm, saying that, I just did this:

my $word_to_replace = 'test';
my $new_word = 'april fools day';

$html =~ s!(>[^<]*?\b)$word_to_replace\b!$new_word!gi;

Last edited by stevencotton; 01 April 2004 at 02:26 PM.
Old 01 April 2004, 02:54 PM
  #18  
dsmith
Scooby Regular
 
dsmith's Avatar
 
Join Date: Mar 1999
Posts: 4,518
Likes: 0
Received 0 Likes on 0 Posts
Default

or maybe I wont try to talk about things I know nowt about

Deano
Related Topics
Thread
Thread Starter
Forum
Replies
Last Post
Mattybr5@MB Developments
Full Cars Breaking For Spares
33
29 August 2017 07:18 PM
Sambob
Engine Management and ECU Remapping
41
27 November 2015 07:36 PM
Mattybr5@MB Developments
Full Cars Breaking For Spares
20
22 October 2015 06:12 AM
blackandz
General Technical
0
12 September 2015 07:01 PM



Quick Reply: Anyone good at "regular expressions"?



All times are GMT +1. The time now is 07:48 AM.