Forum Moderators: coopster & phranque

Message Too Old, No Replies

Regex not matching 0 for AWStats

         

JAB Creations

9:34 am on Feb 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My logs include connection preferences (file.php?connection=0 or file.php?connection=1) however the default regex in AWStats is not detecting the value of zero no matter how I manipulate the regex. Here are a couple examples...

ExtraSectionFirstColumnValues2="QUERY_STRING,connection=([^&]+)"

ExtraSectionFirstColumnValues2="QUERY_STRING,connection=([0-1]+)"

How can I force PERL to detect the value of zero? One possibility in my mind is that it is detecting zero as false and not as a number though I'm not seeing anything specifically helpful with detecting zero.

- John

perl_diver

7:27 pm on Feb 2, 2008 (gmt 0)

10+ Year Member



can we see the regexp?

JAB Creations

9:51 pm on Feb 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ExtraSectionFirstColumnValues2="QUERY_STRING,connection=([^&]+)"

ExtraSectionFirstColumnValues2="QUERY_STRING,connection=([0-1]+)"

Documentation...

# ExtraSectionFirstColumnValuesX is a string to tell AWStats which field to
# extract value from
# (URL,URLWITHQUERY,QUERY_STRING,REFERER,UA,HOST,VHOST,extraX)
# and how to extract the value (using regex syntax). Each different value
# found will appear in first column of report on a different row. Be sure
# that list of different possible values will not grow indefinitely.

Here are another couple things I've tried...

ExtraSectionCondition1="REFERER,\/?¦¦URL,\/?¦¦URLWITHQUERY,\/?"
ExtraSectionFirstColumnValues1="QUERY_STRING,cudio=([0])"

Documentation for ExtraSectionCondition...

# ExtraSectionConditionX are conditions you can use to count or not the hit,
# Use one of the field condition
# (URL,URLWITHQUERY,QUERY_STRING,REFERER,UA,HOST,extraX)
# and a regex to match, after a coma. Use "¦¦" for "OR".

The main problem is that the regex does not detect the exact number of instances in the access log.

- John

perl_diver

11:00 pm on Feb 2, 2008 (gmt 0)

10+ Year Member



Well, I'm not familiar with Awstats so all of that means nothing to me. Those are not regexps that you have posted, I assume they are patterns you pass to the program or that are defined in a file of some sort. How they get interpreted by the program is impossible to tell just from looking at the patterns. I would suspect they get interpreted just as if they were hard coded in a regexp, something like:

if ($blah =~ /([0-1]+)/)

The above would certainly find a 0 (or a 1) in a string. Why it does not work for what you are attempting to do (which is not clear at this point) is a mystery to me.

JAB Creations

11:23 pm on Feb 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here are the three instances that ExtraFirstColumnValues exists in the main PERL script...

if ($param =~ /^ExtraSectionFirstColumnValues(\d+)/) { $ExtraFirstColumnValues[$1]=$value; next; }

if (! $ExtraFirstColumnValues[$extracpt]) { error("Extra section number $extracpt is defined without ExtraSectionFirstColumnValues$extracpt parameter"); }

else { error("Wrong value of parameter ExtraSectionFirstColumnValues$extranum"); }

I don't know if this helps but something is not adding up here!

Using only the following lines for an access log these lines include 9 instances of 'connection=0'…

0.0.0.0 - - [31/Jan/2008:18:43:28 +0000] "GET /home/?audio=1&connection=0 HTTP/1.1" 200 591
0.0.0.0 - - [31/Jan/2008:23:43:07 +0000] "GET /home/?audio=1&connection=0 HTTP/1.0" 200 579 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
0.0.0.0 - - [01/Feb/2008:11:18:27 +0000] "GET /home/home-news.php?audio=0&connection=0 HTTP/1.0" 200 9114 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
0.0.0.0 - - [01/Feb/2008:22:53:33 +0000] "GET /home/?audio=1&connection=0 HTTP/1.0" 200 1106 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
0.0.0.0 - - [01/Feb/2008:22:57:26 +0000] "GET /home/?audio=1&connection=0 HTTP/1.0" 200 1106
0.0.0.0 - - [02/Feb/2008:03:35:48 +0000] "GET /home/?audio=1&connection=0 HTTP/1.1" 200 1118 "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)"
0.0.0.0 - - [02/Feb/2008:04:11:52 +0000] "GET /home/home-news.php?audio=0&connection=0 HTTP/1.1" 200 40908 "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)"
0.0.0.0 - - [02/Feb/2008:04:13:32 +0000] "GET /home/?audio=1&connection=0 HTTP/1.1" 200 591 "-" "WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)"
0.0.0.0 - - [02/Feb/2008:04:13:37 +0000] "GET /home/home-news.php?audio=0&connection=0 HTTP/1.1" 200 9114 "-" "WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)"

Here is the highest detecting extra code I have that detects the most instances…

ExtraSectionName2="HTTP Query Requests"
ExtraSectionCodeFilter2="200 304"
ExtraSectionCondition2="URLWITHQUERY,\/?"
ExtraSectionFirstColumnTitle2="connection Preferences"
ExtraSectionFirstColumnValues2="QUERY_STRING,(connection=[0-2])"
ExtraSectionFirstColumnFormat2="%s"
ExtraSectionStatTypes2=HBL
ExtraSectionAddAverageRow2=0
ExtraSectionAddSumRow2=1
MaxNbOfExtra2=20
MinHitExtra2=2

AWStats detects only six (6) instances. The only relatively close statisical guess I have is that seven (7) though I don't see any matching patterns there. I feel as though I'm staring directly at the issue but I really don't get AWStat's algorithems for figuring stuff out.

- John

JAB Creations

12:16 am on Feb 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are a total of 8 instances. By default AWStats counts 7. When I remove any line it adds up to 6. However after testing the log by deleting a single line and testing no single line removed keeps the count at 7.

A quick note - I added agents and blank referrers compared to the original log I emailed you and modified the MSN bot to include unique single digit numbers in the user agent.

65.208.187.194 - - [01/Feb/2008:11:18:25 +0000] "GET /home/?audio=1&connection=0 HTTP/1.1" 200 591 "-" "msnbot/1.0 (+http://search.msn.com/msnbot1.htm)"
74.6.22.167 - - [01/Feb/2008:11:18:26 +0000] "GET /home/?audio=1&connection=0 HTTP/1.0" 200 579 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
74.6.22.102 - - [01/Feb/2008:11:18:27 +0000] "GET /home/home-news.php?audio=0&connection=0 HTTP/1.0" 200 9114 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
65.55.208.191 - - [01/Feb/2008:22:53:33 +0000] "GET /home/?audio=1&connection=0 HTTP/1.0" 200 1106 "-" "msnbot/1.0 (+http://search.msn.com/msnbot2.htm)"
65.55.165.16 - - [01/Feb/2008:22:57:26 +0000] "GET /home/?audio=1&connection=0 HTTP/1.0" 200 1106 "-" "msnbot/1.0 (+http://search.msn.com/msnbot3.htm)"
61.135.190.16 - - [02/Feb/2008:03:35:48 +0000] "GET /home/?audio=1&connection=0 HTTP/1.1" 200 1118 "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)"
77.91.224.5 - - [02/Feb/2008:04:13:32 +0000] "GET /home/?audio=1&connection=0 HTTP/1.1" 200 591 "-" "WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)"
77.91.224.15 - - [02/Feb/2008:04:13:37 +0000] "GET /home/home-news.php?audio=0&connection=0 HTTP/1.1" 200 9114 "-" "WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)"

Hmm, this line doesn't seem to be detected (but I originally tested it). I could have missed it though...

65.55.165.16 - - [01/Feb/2008:22:57:26 +0000] "GET /home/?audio=1&connection=0 HTTP/1.0" 200 1106 "-" "msnbot/1.0 (+http://search.msn.com/msnbot3.htm)"

No other lines have the same IP, time, or user agent.

Other lines use the same HTTP method, HTTP version, HTTP code, bandwidth, and lack of referrer.

Can we maybe find something truly unique that we can duplicate?

I duplicated the line exactly and received the same count. I then changed the time stamp and received the same results.

- John

perl_diver

3:27 am on Feb 3, 2008 (gmt 0)

10+ Year Member



This appears to be where the pattern is searched for:

if ($param =~ /^ExtraSectionFirstColumnValues(\d+)/)

try and see what the value of $param is at that point. Write it to a file or print it to the screen while the script executes (if you can).

perl_diver

3:30 am on Feb 3, 2008 (gmt 0)

10+ Year Member



I just noticed that there is no $ preceeding ExtraSectionFirstColumnValues in this line:

if ($param =~ /^ExtraSectionFirstColumnValues(\d+)/)

unless it has been defined as a Constant somewhere the regexp is literally looking "ExtraSectionFirstColumnValues".

perl_diver

8:43 pm on Feb 3, 2008 (gmt 0)

10+ Year Member



I think I misunderstood what that regexp is doing. It is building an array of search patterns from a file (I think), not searching the log file. There must be an array somewhere, @ExtraFirstColumnValues, that is being used to search the log file.