[AWFFULL] mangle urls?
Anthony J. Biacco
abiacco at formatdynamics.com
Tue Jul 22 09:02:58 EST 2008
El Dom, 13 de Julio de 2008, 3:50 pm, Anthony J. Biacco escribió:
> Will I have to do an ignoreurl then on the same expression?
>I've never tried it but I suppose GroupAndHideUrl could work. Is this
>correct, Steve?
That won't work as it'll still keep the data in the Inc file, just not show it on the report.
I also just noticed I have a bigger problem. I got about 10M ip addresses/sites in my inc file for the month so far.
I cant "ignoresite *" because that'll ignore every line in my log file. How would I go about not adding any sites to my Inc file, i.e. not doing any processing on site data?
I suppose I could use sed/awk to change every ip address in the file to a single ip address, therefore giving me just 1 ip in the Inc file (as steve recommended I do for my /pt/t/1* problem (I don't care a whole lot about site data)). Any sed/awk experts out there that can help a brother out on the replacement command I would do for these 2 fields?
Given a log line in the effect of..
199.231.48.128 - - [21/Jul/2008:16:49:41 -0600] "GET /pt/t/1216680581236?&d=2008&a=Microsoft%20Internet%20Explorer%20Mozilla/4.0
%20%28compatible%3B%20MSIE%206.0%3B%20Windows%20NT%205.1%3B%20SV1%29&s=7%2C4852%2COther%2C3430&u=http%3A//reviews.cnet.com/car-g
ps-navigation/garmin-nuvi-660/4852-3430_7-32078943.html%3Ford%3DcreationDate+desc&p=0&q=1.1 HTTP/1.1" 200 49 "http://reviews.cne
t.com/car-gps-navigation/garmin-nuvi-660/4852-3430_7-32078943.html?ord=creationDate+desc" "Mozilla/4.0 (compatible; MSIE 6.0; Wi
ndows NT 5.1; SV1)" -
To change it to (and given that I'd have to pipe the log file on stdin to awfull instead of using the LogFile config directive):
1.1.1.1 - - [21/Jul/2008:16:49:41 -0600] "GET /pt/t/1 HTTP/1.1" 200 49 "http://reviews.cnet.com/car-gps-navigation/garmin-nuvi-660/4852-3430_7-32078943.html?ord=creationDate+desc" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" -
-Tony
---------------------------
Manager, IT Operations
Format Dynamics, Inc.
303-573-1800x27
abiacco at formatdynamics.com
http://www.formatdynamics.com
-----Original Message-----
From: javier wilson [mailto:javier at guegue.net]
Sent: Monday, July 14, 2008 10:25 AM
To: Anthony J. Biacco
Subject: RE: [AWFFULL] mangle urls?
El Dom, 13 de Julio de 2008, 3:50 pm, Anthony J. Biacco escribió:
> Will I have to do an ignoreurl then on the same expression?
I've never tried it but I suppose GroupAndHideUrl could work. Is this
correct, Steve?
javier
>
> -----Original Message-----
> From: javier wilson <javier at guegue.net>
> Sent: Sunday, July 13, 2008 12:50 PM
> To: Steve McInerney <steve at stedee.id.au>
> Cc: Anthony J. Biacco <abiacco at formatdynamics.com>; awffull at stedee.id.au
> <awffull at stedee.id.au>
> Subject: Re: [AWFFULL] mangle urls?
>
> El Dom, 13 de Julio de 2008, 2:27 am, Steve McInerney escribió:
>> on 12/07/08 02:47 Anthony J. Biacco said the following:
>>> Is there a way to get awffull to trim back urls for counting?
>>
>> Within awffull? No. External pre-filtering via SED or AWK and pipe into
>> awffull is likely your best bet.
>>
>>
>>> i.e. I have some urls in my logs of the format /pt/t/1xxxxxxxxx
>>> I want to count these, but I'd like to actually like to just count them
>>> as 1 url, namely /pt/t. Kind of like a MangleAgent, but for urls?
>>> Reason being, all the /pt/t/1xxxxxxxxx urls fill up my incremental
>>> webalizer.current file, so that it's like 500+ megs 10 days through the
>>> month.
>>
>> :-(
>>
>>
>>> Is this possible?
>
> I think you can use GroupURL, like this
> /pt/t/1* MyGroup
>
> You'll get hits an volume on 1* as a total, but not vistis :(
>
> javier
>
>
--
javier wilson - guegue.com - tel. (505) 252 4056
More information about the AWFFull
mailing list