This site is for sale,
Learn More
A Comprehensive Strategy for Using Web Site Statistics
Track The Effectiveness Of Your Marketing Effort
Part 3
By Carlton Lovegrove
Originally Published: March, 15 2005
Continued From:
<<< A Comprehensive Strategy for Using Web Site Statistics Part 2
Log file data
IP and Domain Name Counting
You can also learn something about visitors by studying their domain names. Though the log file may record IP addresses, your log analysis program can determine from many of these IP numbers the associated domain or ISP. This might tell you if your most important client -- or competitor -- has been looking at your web pages.
The most simplistic assumption to make about users is that each IP address or domain name represents a unique user. Using this method, all the requests made by the same host are treated as through from a single user. When a new host is detected, a new user profile is created and the corresponding requests are associated to the new user. Several methods that use additional information recorded in the access logs or other heuristics are also possible. One refinement is to use the user agent field. Using this method, new users are identified as above as well as when requests coming from the same machine have different user agents. Another refinement is to place session timeouts on requests made from the same machine. The intuition is that if a certain amount of time has elapsed, then the old user has left the site and a new user has entered.
When using these methods for identifying users, the following situations occur when sequentially processing access logs:
-
a new IP address is encountered (assume this is a new user),
-
an already processed IP address is encountered
-
the user agent matches prior requests (assume this is the same user),
-
the user agent filed does not match any prior requests form the same IP (assume this is a new user)
-
when a session is terminated due to a timeout, assume a new user has entered the site.
Therefore, if a substantial part of your statistics imply that many of the new hosts and timeouts were from hosts in the same domain/IP address space, you can infer that a large number of web site users either connect to the Web via ISPs with load balancing proxies, or that a large number of different users access the site from within the same domain as would occur with a large company, or that some combination of both cases exist.
Regardless, a significant number of page requests can result in ambiguous cases, where it is not possible to determine the existence of new users with certainty. While the incidence rate can vary considerably from Web site to Web site, the results can be inaccurate since these IP-based methods and other IP-based derivatives are used in cases where unique identifiers like cookies are not present.
Caching
Another major problem that dilutes the quality of the data is caching. There are two major types of caching. First, browsers automatically cache files when they are downloaded. When this is done, it is not necessary to subsequently download the entire page again. Depending on the browser settings, it can determine if the page has changed: in which case, you do know about it, and a page request is recorded. However, if the browser is not set to verify if a page has changed, then the user can read the page without any entry being recorded in the web log.
In addition, almost all ISPs now have their own cache. This means that when a web page request is made to the same page that anyone else from the ISP has made recently, the cache will have saved it, and will release it without any request being made to the original site. Therefore many people could request a site's pages from the same cache without the original web site (or its logs) even knowing about it.
For example, AOL uses caching extensively, and a single user with an AOL account may be reflected in your server logs by several different IP numbers as AOL uses its caching to grab the files for its user. If this happens, the logs will fail to identify a repeat customer. In addition, the logs will not be able to record if a visitor typed a URL into their browser after seeing a particular advertisement. If already cached when called, no page requests at all might show up in the logs.
Continued:
A Comprehensive Strategy for Using Web Site Statistics Part 4 >>>
Continued From:
<<< A Comprehensive Strategy for Using Web Site Statistics Part 2
Carlton Lovegrove is a PhD of Information Systems
Site Promotion Articles Indexes:
|