Hi Chaps
Anyone have any idea how I can get a keyword frequency list? For instance... If I wanted to know how many times the word "Exhaust" had been posted on scoobynet over the last week? Or even better, a list of keywords from most common down to least common (or a subset of). Thanks in advance Simon |
Webamster,
Googled and googled ,and in particular this , as below, and this - see Sep. 15, 2000. http://www.nehuenmultimedia.com.ar/a...es/kdamain.gif Remember, I know FA bout' puters , and would'nt know an SQL if it ran up my trouser leg. cheers, mj. |
Hi mj
Thanks for your time in looking for then. I've had a good scour of google, and the above links, but there's nothing there that helps unfortunately :( Cheers Simon |
Surely just
select count(*) from posts where (posts.body LIKE '%exhaust%') and (DATE_ADD(posts.date, INTERVAL 7 DAY) > NOW()); or something like that.... Obviously you can simplify that if you have a fixed date instead of doing "the last week" Mail me if you want anything further, unfortunately I'm working today on our dual MySQL server setup :( [Edited by Andrewza - 3/30/2003 4:08:09 PM] |
Alternatively see Here for MySQL Full text search indexes
|
Hi Andrewza
Thanks for that. Unfortunately, that will only give me the number of posts that contain one or more occurance of that word, rather than the number of times that word appears. The field already has a fulltext index, so searching for the word is no problem. The difficulty I'm having is counting the number of occurances. All the best Simon |
Ah, yes, I'm not sure of a good way to do it, certainly not all in MySQL itself. I mostly use PHP to talk to MySQL so I'd use a similar query to find the posts with the string in then use PHP4's substr_count on each row returned in the MySQL result to get a count for each post and add them up.
|
couldn't you write a cgi script / native (linux?) binary to do it?
or if you want to do it once per week / day schedule a command "Mysqldump scoobnet > dump.sql" then "./wordcount > wordcount.txt" ? I know its not really what your after but MySQL isn't exactly the most advanced of databases ;) [-edited to add-] ahh... re-read post, is this to imporve the searching? [Edited by Dizzy - 3/31/2003 9:35:17 AM] |
Hi Mate
Thanks for taking the time to do that. I could do a similar thing in asp, but there are literally GBs of text on scoobynet, so it would be incredibly slow. It doesn't look like there is a direct method, so I'm going to have to think of another way forward. Thanks for your time. All the best Simon |
How you execute it depends on what it's for, what are you trying to do?
Steve. |
I'm somewhat fortunate that for my websites I have journalists who have to associate a database of keywords with their articles :D
|
<thinking out loud>
Create a table of keywords you want, just the word and a autoincrememnt INT primary key field. Then an offline script that reads the keywords table, selects threads containing keywords, then reads each and counts the number of times the keyword is present, this then inserts into another table that contains the keyword id, the thread id and a count. You can then use these two tables to find quickly threads that have the keywords you're looking for and how 'relevant' they are by count. The script would have to be a cron job (or equivalent), just run it once every few hours. You could also store the last thread id that was indexed so you only indexed new threads, although you'd miss out on edits you'd save yourself a tonne of cycles. Running the script for the first time would of course be the kicker on ~200,000 threads, I would if you have the hardware, make a copy of the DB on another box and run it there. Mmm, coffee... Or even use MySQL replication to this other box and run the script permanently on there, reading the existing state from the replicated box and updating the keyword index table on the live one. </thinking out loud> [Edited by Andrewza - 3/31/2003 12:04:41 PM] |
here looks like your best bet :)
|
All times are GMT +1. The time now is 05:15 AM. |
© 2024 MH Sub I, LLC dba Internet Brands