Notices
Computer & Technology Related Post here for help and discussion of computing and related technology. Internet, TVs, phones, consoles, computers, tablets and any other gadgets.

Dealing with large web logs

Thread Tools
 
Search this Thread
 
Old 22 June 2006, 11:12 AM
  #1  
NotoriousREV
Scooby Regular
Thread Starter
 
NotoriousREV's Avatar
 
Join Date: Jan 2002
Posts: 11,581
Likes: 0
Received 0 Likes on 0 Posts
Default Dealing with large web logs

Does anyone here have to deal with large volumes of web logs? The site I look after generates ~25Gb of logs per day (in 10Mb files, althugh this is configurable). We already use an non-log based analytics package to do the fancy stuff (path through sites, demographics).

I want to be able to do more belt and braces stuff. The simple stuff is borwser version info (any package will do that), but we often get asked to look at specific url's and see who veiwed it in a certain time frame and then see what else that ip looked at etc. so whatever I use needs to have really good drill down functionality.

We also use a load balancing system that breaks down a number of different domains to clusters of individual servers. All the web logs come from the load balancing equipment, so it would be vital to be able to assign the individual servers to a specific domain so I could see visitor numbers per domain and then break down how many people got directed to which server.

Obviously, this is going to be db driven. so far the only package I've looked at that comes close is Sawmill, but it's very slow and locks users out of the reports while it's updating. I could update overnight, but it takes 12 hours to process 24 hours of logs or I could update once an hour but it still takes almost half an hour to process the new logs.

So, the question is: what are you guys using for this kind of thing?
Old 22 June 2006, 12:27 PM
  #2  
stevencotton
Scooby Regular
 
stevencotton's Avatar
 
Join Date: Jan 2001
Location: behind twin turbos
Posts: 2,710
Likes: 0
Received 1 Like on 1 Post
Default

I use Sawmill It's definitely the best. There isn't a lot you can do about it, you have a hell of a lot of data there and you're probably limited by CPU and potentially disk/memory and IO.

I would move logging backwards and take the load balancers out of the equation altogether, can the back-end servers not log themselves?

What infrastructure are you using, is it Apache based? Perhaps mod_log_spread can help you out if so. Sounds like you've hit a critical point where your logging scenario needs to be standardised so you will probably benefit from working out a Nice Way to do it.
Related Topics
Thread
Thread Starter
Forum
Replies
Last Post
KAS35RSTI
Subaru
27
04 November 2021 07:12 PM
fatboy_coach
General Technical
15
18 June 2016 03:48 PM
Mattybr5@MB Developments
Full Cars Breaking For Spares
12
18 November 2015 07:03 AM
south_scoob
ScoobyNet General
22
03 October 2015 01:05 PM
Ganz1983
Subaru
5
02 October 2015 09:22 AM



Quick Reply: Dealing with large web logs



All times are GMT +1. The time now is 03:14 AM.