Anyone good at "regular expressions"?

Reply Subscribe

Thread Tools

Search this Thread

01 April 2004, 09:42 AM

ex-webby

Orange Club

Thread Starter

Join Date: Oct 1998

Posts: 13,763

Likes: 0

Received 1 Like on 1 Post

Anyone good at "regular expressions"?

If so...

how do you match a whole word from a string.

eg. word = "test"

String = "test, test testing....test;"

I need it to match all of those apart from the "test" part of "testing".

Thanks in advance

Simon

Reply Like

01 April 2004, 09:51 AM

Scooby Regular

Join Date: Jul 2002

Location: West Sussex

Posts: 271

Likes: 0

Received 0 Likes on 0 Posts

Depends what you're using, but in Perl it's /\btest\b/

(\b for word boundary)

HTH,
James

Reply Like

01 April 2004, 10:02 AM

Fosters

Scooby Regular

Join Date: Jul 2000

Location: Islington

Posts: 2,145

Likes: 0

Received 0 Likes on 0 Posts

what language Simon?

Reply Like

01 April 2004, 10:02 AM

ex-webby

Orange Club

Thread Starter

Join Date: Oct 1998

Posts: 13,763

Likes: 0

Received 1 Like on 1 Post

JV

thank you very much. It's actually in PHP.. don't suppose you know if this will use the same format do you? Apologies, I am useless at this!

Cheers

Simon

Reply Like

01 April 2004, 10:05 AM

ex-webby

Orange Club

Thread Starter

Join Date: Oct 1998

Posts: 13,763

Likes: 0

Received 1 Like on 1 Post

the actual requirement is to replace the word with another word wherever it appears... but only the full word.

Cheers
Simon

Reply Like

01 April 2004, 10:19 AM

Scooby Regular

Join Date: Jul 2002

Location: West Sussex

Posts: 271

Likes: 0

Received 0 Likes on 0 Posts

Not sure about PHP, but this might help:

http://uk2.php.net/manual/en/function.preg-replace.php

Reply Like

01 April 2004, 10:30 AM

RallyMarshal

Scooby Regular

Join Date: Aug 2002

Posts: 703

Likes: 0

Received 0 Likes on 0 Posts

Then cant you do a simple replace function that looks for [space]WORD[space] rather than a regular expression that might strip out content from within another string?

Reply Like

Trending Topics

What is it with this forum and abuse?

146

5.1k
2009 impreza no turbo

1

111
Cars doors locking then unlocking immediately with remote

1

80
MDX321T how much power did yours make?

81

12.1k
Heating either on or off, no modulaiton. 05 sti

3

286

01 April 2004, 10:34 AM

stevencotton

Scooby Regular

Join Date: Jan 2001

Location: behind twin turbos

Posts: 2,710

Likes: 0

Received 1 Like on 1 Post

That's what \b does. You can't look for /\stest\s/ since that won't change something like "test, ".

Reply Like

01 April 2004, 10:41 AM

ex-webby

Orange Club

Thread Starter

Join Date: Oct 1998

Posts: 13,763

Likes: 0

Received 1 Like on 1 Post

JV

Perfect! Thank you very much, that's done the job

All the best

Simon

Reply Like

01 April 2004, 10:52 AM

#10

icantthinkofone

Scooby Regular

Join Date: Dec 2003

Posts: 97

Likes: 0

Received 0 Likes on 0 Posts

I think the 'regexp' (purist!??) way would be to use a character class eg {., }word{., } which will match any of the supplied options (rather than the /b); regexp being language independant.

That said, if php supplies a /b style option, use it, it's a lot easier the generating a complex regexp!

Reply Like

01 April 2004, 10:57 AM

#11

stevencotton

Scooby Regular

Join Date: Jan 2001

Location: behind twin turbos

Posts: 2,710

Likes: 0

Received 1 Like on 1 Post

Character classes are done with square brackets rather than curly braces, which are used for matching repetition. The regex purist will use what's available in the implementation, rather than sticking to one way of doing things

If the PHP regular expression parser is POSIX compliant it should be ok.

Reply Like

01 April 2004, 01:27 PM

#12

ex-webby

Orange Club

Thread Starter

Join Date: Oct 1998

Posts: 13,763

Likes: 0

Received 1 Like on 1 Post

OK..

it gets more complicated....

I'm using the \b thing which works perfectly... thank you...

But.. I want it to ignore the text if it is within certain specific tags...

best example would be in html...

testing test <a href="http://www.test.com">test</a> test <img src="http://www.test.com/image.jpg">

in this example, I would want it to ignore the two "test"s between "<a href" and "</a>" and also ignore the test inbetween "<img" and ">"

is this possible, or does it get way too complex?

Cheers

Simon

Reply Like

01 April 2004, 01:35 PM

#13

Fosters

Scooby Regular

Join Date: Jul 2000

Location: Islington

Posts: 2,145

Likes: 0

Received 0 Likes on 0 Posts

fussy b@stard!

Reply Like

01 April 2004, 01:37 PM

#14

ex-webby

Orange Club

Thread Starter

Join Date: Oct 1998

Posts: 13,763

Likes: 0

Received 1 Like on 1 Post

LOL

Reply Like

01 April 2004, 01:43 PM

#15

Fosters

Scooby Regular

Join Date: Jul 2000

Location: Islington

Posts: 2,145

Likes: 0

Received 0 Likes on 0 Posts

how about a routine that scans the page and replaces everything between "<a href" and "/a>" with spaces. same with "<img" tags and then do your /b thingie?

kinda like a mid(rah, rah, rah)=space$(1) in vb

Reply Like

01 April 2004, 01:50 PM

#16

ex-webby

Orange Club

Thread Starter

Join Date: Oct 1998

Posts: 13,763

Likes: 0

Received 1 Like on 1 Post

Was trying to use a regular expression to avoid parsing all the text each time for performance.

If it can't be done in one regular expression, I guess I'll have to run two processes on it.

Cheers
Simon

Reply Like

01 April 2004, 02:18 PM

#17

stevencotton

Scooby Regular

Join Date: Jan 2001

Location: behind twin turbos

Posts: 2,710

Likes: 0

Received 1 Like on 1 Post

Regex matching is anything but efficient so you're already imposing a performance hit using them, even if you precompile it. You can't do what you want with one regular expression, you'd be better off with a mixture of some kind of SAX parser (so you can ignore all the HTML tags) and regex the substitution.

Hmm, saying that, I just did this:

my $word_to_replace = 'test';
my $new_word = 'april fools day';

$html =~ s!(>[^<]*?\b)$word_to_replace\b!$new_word!gi;

Last edited by stevencotton; 01 April 2004 at 02:26 PM.

Reply Like