Notices
Computer & Technology Related Post here for help and discussion of computing and related technology. Internet, TVs, phones, consoles, computers, tablets and any other gadgets.

MySQL - Keyword Frequency

Thread Tools
 
Search this Thread
 
Old 30 March 2003, 12:46 PM
  #1  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Post

Hi Chaps

Anyone have any idea how I can get a keyword frequency list?

For instance...

If I wanted to know how many times the word "Exhaust" had been posted on scoobynet over the last week?

Or even better, a list of keywords from most common down to least common (or a subset of).

Thanks in advance

Simon
Old 30 March 2003, 01:21 PM
  #2  
mj
Scooby Regular
 
mj's Avatar
 
Join Date: Apr 2002
Location: The poliotical wing of Chip Sengravy.
Posts: 6,129
Likes: 0
Received 0 Likes on 0 Posts
Post

Webamster,
Googled and googled ,and in particular this , as below, and this - see Sep. 15, 2000.



Remember, I know FA bout' puters , and would'nt know an SQL if it ran up my trouser leg.

cheers,

mj.

Old 30 March 2003, 01:49 PM
  #3  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Post

Hi mj

Thanks for your time in looking for then. I've had a good scour of google, and the above links, but there's nothing there that helps unfortunately

Cheers

Simon
Old 30 March 2003, 03:04 PM
  #4  
Andrewza
Scooby Regular
 
Andrewza's Avatar
 
Join Date: Jan 2002
Posts: 667
Likes: 0
Received 0 Likes on 0 Posts
Post

Surely just

select count(*) from posts where (posts.body LIKE '%exhaust%') and (DATE_ADD(posts.date, INTERVAL 7 DAY) > NOW());

or something like that....

Obviously you can simplify that if you have a fixed date instead of doing "the last week"

Mail me if you want anything further, unfortunately I'm working today on our dual MySQL server setup

[Edited by Andrewza - 3/30/2003 4:08:09 PM]
Old 30 March 2003, 03:11 PM
  #5  
Andrewza
Scooby Regular
 
Andrewza's Avatar
 
Join Date: Jan 2002
Posts: 667
Likes: 0
Received 0 Likes on 0 Posts
Post

Alternatively see Here for MySQL Full text search indexes
Old 30 March 2003, 03:20 PM
  #6  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Post

Hi Andrewza

Thanks for that. Unfortunately, that will only give me the number of posts that contain one or more occurance of that word, rather than the number of times that word appears.

The field already has a fulltext index, so searching for the word is no problem. The difficulty I'm having is counting the number of occurances.

All the best

Simon
Old 30 March 2003, 04:53 PM
  #7  
Andrewza
Scooby Regular
 
Andrewza's Avatar
 
Join Date: Jan 2002
Posts: 667
Likes: 0
Received 0 Likes on 0 Posts
Post

Ah, yes, I'm not sure of a good way to do it, certainly not all in MySQL itself. I mostly use PHP to talk to MySQL so I'd use a similar query to find the posts with the string in then use PHP4's substr_count on each row returned in the MySQL result to get a count for each post and add them up.
Old 31 March 2003, 08:28 AM
  #8  
Dizzy
Scooby Regular
 
Dizzy's Avatar
 
Join Date: May 2001
Posts: 2,537
Likes: 0
Received 0 Likes on 0 Posts
Post

couldn't you write a cgi script / native (linux?) binary to do it?
or
if you want to do it once per week / day
schedule a command
"Mysqldump scoobnet > dump.sql"
then
"./wordcount > wordcount.txt"
?

I know its not really what your after but MySQL isn't exactly the most advanced of databases

[-edited to add-]
ahh... re-read post, is this to imporve the searching?


[Edited by Dizzy - 3/31/2003 9:35:17 AM]
Old 31 March 2003, 10:23 AM
  #9  
ex-webby
Orange Club
Thread Starter
 
ex-webby's Avatar
 
Join Date: Oct 1998
Posts: 13,763
Likes: 0
Received 1 Like on 1 Post
Post

Hi Mate

Thanks for taking the time to do that. I could do a similar thing in asp, but there are literally GBs of text on scoobynet, so it would be incredibly slow.

It doesn't look like there is a direct method, so I'm going to have to think of another way forward.

Thanks for your time.

All the best

Simon
Old 31 March 2003, 10:29 AM
  #10  
stevencotton
Scooby Regular
 
stevencotton's Avatar
 
Join Date: Jan 2001
Location: behind twin turbos
Posts: 2,710
Likes: 0
Received 1 Like on 1 Post
Post

How you execute it depends on what it's for, what are you trying to do?

Steve.
Old 31 March 2003, 10:52 AM
  #11  
Andrewza
Scooby Regular
 
Andrewza's Avatar
 
Join Date: Jan 2002
Posts: 667
Likes: 0
Received 0 Likes on 0 Posts
Post

I'm somewhat fortunate that for my websites I have journalists who have to associate a database of keywords with their articles
Old 31 March 2003, 11:00 AM
  #12  
Andrewza
Scooby Regular
 
Andrewza's Avatar
 
Join Date: Jan 2002
Posts: 667
Likes: 0
Received 0 Likes on 0 Posts
Post

<thinking out loud>
Create a table of keywords you want, just the word and a autoincrememnt INT primary key field. Then an offline script that reads the keywords table, selects threads containing keywords, then reads each and counts the number of times the keyword is present, this then inserts into another table that contains the keyword id, the thread id and a count. You can then use these two tables to find quickly threads that have the keywords you're looking for and how 'relevant' they are by count.

The script would have to be a cron job (or equivalent), just run it once every few hours. You could also store the last thread id that was indexed so you only indexed new threads, although you'd miss out on edits you'd save yourself a tonne of cycles. Running the script for the first time would of course be the kicker on ~200,000 threads, I would if you have the hardware, make a copy of the DB on another box and run it there.

Mmm, coffee...

Or even use MySQL replication to this other box and run the script permanently on there, reading the existing state from the replicated box and updating the keyword index table on the live one.

</thinking out loud>

[Edited by Andrewza - 3/31/2003 12:04:41 PM]
Old 01 April 2003, 02:27 PM
  #13  
Dizzy
Scooby Regular
 
Dizzy's Avatar
 
Join Date: May 2001
Posts: 2,537
Likes: 0
Received 0 Likes on 0 Posts
Post

here looks like your best bet
Related Topics
Thread
Thread Starter
Forum
Replies
Last Post
TECHNOPUG
General Technical
11
21 September 2015 05:42 PM
Hos
Computer & Technology Related
11
30 May 2002 07:23 PM
WREXY
ScoobyNet General
8
07 November 2001 11:42 PM
SDB
Non Scooby Related
2
26 May 2001 11:55 PM
gregh
Non Scooby Related
6
16 May 2001 11:17 PM



Quick Reply: MySQL - Keyword Frequency



All times are GMT +1. The time now is 05:14 PM.