MySQL - Keyword Frequency
#1
Hi Chaps
Anyone have any idea how I can get a keyword frequency list?
For instance...
If I wanted to know how many times the word "Exhaust" had been posted on scoobynet over the last week?
Or even better, a list of keywords from most common down to least common (or a subset of).
Thanks in advance
Simon
Anyone have any idea how I can get a keyword frequency list?
For instance...
If I wanted to know how many times the word "Exhaust" had been posted on scoobynet over the last week?
Or even better, a list of keywords from most common down to least common (or a subset of).
Thanks in advance
Simon
#2
Scooby Regular
Join Date: Apr 2002
Location: The poliotical wing of Chip Sengravy.
Posts: 6,129
Likes: 0
Received 0 Likes
on
0 Posts
#4
Surely just
select count(*) from posts where (posts.body LIKE '%exhaust%') and (DATE_ADD(posts.date, INTERVAL 7 DAY) > NOW());
or something like that....
Obviously you can simplify that if you have a fixed date instead of doing "the last week"
Mail me if you want anything further, unfortunately I'm working today on our dual MySQL server setup
[Edited by Andrewza - 3/30/2003 4:08:09 PM]
select count(*) from posts where (posts.body LIKE '%exhaust%') and (DATE_ADD(posts.date, INTERVAL 7 DAY) > NOW());
or something like that....
Obviously you can simplify that if you have a fixed date instead of doing "the last week"
Mail me if you want anything further, unfortunately I'm working today on our dual MySQL server setup
[Edited by Andrewza - 3/30/2003 4:08:09 PM]
#6
Hi Andrewza
Thanks for that. Unfortunately, that will only give me the number of posts that contain one or more occurance of that word, rather than the number of times that word appears.
The field already has a fulltext index, so searching for the word is no problem. The difficulty I'm having is counting the number of occurances.
All the best
Simon
Thanks for that. Unfortunately, that will only give me the number of posts that contain one or more occurance of that word, rather than the number of times that word appears.
The field already has a fulltext index, so searching for the word is no problem. The difficulty I'm having is counting the number of occurances.
All the best
Simon
#7
Ah, yes, I'm not sure of a good way to do it, certainly not all in MySQL itself. I mostly use PHP to talk to MySQL so I'd use a similar query to find the posts with the string in then use PHP4's substr_count on each row returned in the MySQL result to get a count for each post and add them up.
Trending Topics
#8
couldn't you write a cgi script / native (linux?) binary to do it?
or
if you want to do it once per week / day
schedule a command
"Mysqldump scoobnet > dump.sql"
then
"./wordcount > wordcount.txt"
?
I know its not really what your after but MySQL isn't exactly the most advanced of databases
[-edited to add-]
ahh... re-read post, is this to imporve the searching?
[Edited by Dizzy - 3/31/2003 9:35:17 AM]
or
if you want to do it once per week / day
schedule a command
"Mysqldump scoobnet > dump.sql"
then
"./wordcount > wordcount.txt"
?
I know its not really what your after but MySQL isn't exactly the most advanced of databases
[-edited to add-]
ahh... re-read post, is this to imporve the searching?
[Edited by Dizzy - 3/31/2003 9:35:17 AM]
#9
Hi Mate
Thanks for taking the time to do that. I could do a similar thing in asp, but there are literally GBs of text on scoobynet, so it would be incredibly slow.
It doesn't look like there is a direct method, so I'm going to have to think of another way forward.
Thanks for your time.
All the best
Simon
Thanks for taking the time to do that. I could do a similar thing in asp, but there are literally GBs of text on scoobynet, so it would be incredibly slow.
It doesn't look like there is a direct method, so I'm going to have to think of another way forward.
Thanks for your time.
All the best
Simon
#12
<thinking out loud>
Create a table of keywords you want, just the word and a autoincrememnt INT primary key field. Then an offline script that reads the keywords table, selects threads containing keywords, then reads each and counts the number of times the keyword is present, this then inserts into another table that contains the keyword id, the thread id and a count. You can then use these two tables to find quickly threads that have the keywords you're looking for and how 'relevant' they are by count.
The script would have to be a cron job (or equivalent), just run it once every few hours. You could also store the last thread id that was indexed so you only indexed new threads, although you'd miss out on edits you'd save yourself a tonne of cycles. Running the script for the first time would of course be the kicker on ~200,000 threads, I would if you have the hardware, make a copy of the DB on another box and run it there.
Mmm, coffee...
Or even use MySQL replication to this other box and run the script permanently on there, reading the existing state from the replicated box and updating the keyword index table on the live one.
</thinking out loud>
[Edited by Andrewza - 3/31/2003 12:04:41 PM]
Create a table of keywords you want, just the word and a autoincrememnt INT primary key field. Then an offline script that reads the keywords table, selects threads containing keywords, then reads each and counts the number of times the keyword is present, this then inserts into another table that contains the keyword id, the thread id and a count. You can then use these two tables to find quickly threads that have the keywords you're looking for and how 'relevant' they are by count.
The script would have to be a cron job (or equivalent), just run it once every few hours. You could also store the last thread id that was indexed so you only indexed new threads, although you'd miss out on edits you'd save yourself a tonne of cycles. Running the script for the first time would of course be the kicker on ~200,000 threads, I would if you have the hardware, make a copy of the DB on another box and run it there.
Mmm, coffee...
Or even use MySQL replication to this other box and run the script permanently on there, reading the existing state from the replicated box and updating the keyword index table on the live one.
</thinking out loud>
[Edited by Andrewza - 3/31/2003 12:04:41 PM]
Thread
Thread Starter
Forum
Replies
Last Post