ScoobyNet.com - Subaru Enthusiast Forum

ScoobyNet.com - Subaru Enthusiast Forum (https://www.scoobynet.com/)
-   Computer & Technology Related (https://www.scoobynet.com/computer-and-technology-related-34/)
-   -   Dealing with large web logs (https://www.scoobynet.com/computer-and-technology-related-34/524704-dealing-with-large-web-logs.html)

NotoriousREV 22 June 2006 11:12 AM

Dealing with large web logs
 
Does anyone here have to deal with large volumes of web logs? The site I look after generates ~25Gb of logs per day (in 10Mb files, althugh this is configurable). We already use an non-log based analytics package to do the fancy stuff (path through sites, demographics).

I want to be able to do more belt and braces stuff. The simple stuff is borwser version info (any package will do that), but we often get asked to look at specific url's and see who veiwed it in a certain time frame and then see what else that ip looked at etc. so whatever I use needs to have really good drill down functionality.

We also use a load balancing system that breaks down a number of different domains to clusters of individual servers. All the web logs come from the load balancing equipment, so it would be vital to be able to assign the individual servers to a specific domain so I could see visitor numbers per domain and then break down how many people got directed to which server.

Obviously, this is going to be db driven. so far the only package I've looked at that comes close is Sawmill, but it's very slow and locks users out of the reports while it's updating. I could update overnight, but it takes 12 hours to process 24 hours of logs or I could update once an hour but it still takes almost half an hour to process the new logs.

So, the question is: what are you guys using for this kind of thing?

stevencotton 22 June 2006 12:27 PM

I use Sawmill :) It's definitely the best. There isn't a lot you can do about it, you have a hell of a lot of data there and you're probably limited by CPU and potentially disk/memory and IO.

I would move logging backwards and take the load balancers out of the equation altogether, can the back-end servers not log themselves?

What infrastructure are you using, is it Apache based? Perhaps mod_log_spread can help you out if so. Sounds like you've hit a critical point where your logging scenario needs to be standardised so you will probably benefit from working out a Nice Way to do it.


All times are GMT +1. The time now is 05:29 AM.


© 2024 MH Sub I, LLC dba Internet Brands