Analyze your Web site traffic

[This article was originally published in c|net's Builder.com on February 3, 1998. For updated information, run a search on "traffic analysis" at Builder.com, c|net's "Web Building" site.]

If you run a Web site, you're probably already thinking about tracking and analyzing the traffic it gets. Knowing how many pages are accessed, when, by whom, and for what purposes can mean the difference between simply having a Web site and building a sound Web strategy.

Understanding how people use your site can help you--and your sales and marketing team--generate more traffic. If you can track your audience, learn which pages and resources are most popular, and identify technical problems and system bottlenecks, you can deliver a better experience. And that's the best way to keep people coming back to your site.

The ways to analyze a site's traffic vary as much as the ways to design and build one. Before deciding on a Web analysis product or service, you need to understand how traffic analysis works--the units of measurement and the importance of profiling your users. Once you decide what's important, you can narrow down the available products and services to find the ones that will work for you.

To make it easier to find what you're looking for, we've broken up the market into four subcategories:

1. Log file analysis tools

2. Dynamic analysis tools

3. Tools specifically for managing and analyzing advertising

4. Services for third-party auditing and Web ratings

If you can't find what you're looking for, don't panic. The market is still so new that you stand a good chance of convincing vendors to incorporate your requirements into future versions.

Understanding Web Metrics

There are essentially two types of Web metrics. Basic metrics comprise the data that anyone can track easily. Basic metrics require no special tools--the information is available in the standard log files generated by most Web servers. This information includes the IP addresses of visitors, date and time stamps, files accessed, where the visitor came from (the HTTP referer), and the type of browser used (the user agent).

But while basic metrics are easy to track, they're not all that useful. More advanced methods combine several basic metrics to yield more sophisticated and useful information, such as impressions and unique and nonunique visitors.

Some analysis tools track unique visitors by assuming that each user with the same IP address and the same browser is a unique visitor--basically by looking at the log files line by line and counting up the matches, either manually or automatically. But tying technical transaction data to actual people isn't very reliable

The most accurate way to gather visitor data is to make users enter a unique ID and password and then track their travels through your site with cookies or server objects. Unfortunately, many users resist IDs and passwords, so you could end up reducing the traffic you're trying to measure.

You can get a rough estimate of nonunique visitors by looking at the IP addresses in your incoming server-traffic logs (either manually or by writing code to automatically check and analyze the logs). Unfortunately, big proxy servers such as UUNet and America Online don't provide useful information about individuals coming to your site, so traffic analysis applications typically discard these addresses.

Similarly, smart applications detect and eliminate IP addresses from robots, agents, and Webcrawlers. To deduce unique visitors coming from proxy servers, some tools and services use an algorithm to examine IP strings within IP addresses.

Once you are able to track pages and visitors separately, the next logical step is to track the relationships between pages and visitors--in fact, the relationships between all measurable traffic data. Web site managers and advertisers both want to know how many pages--and which pages--each visitor sees.

On the advertising side, standards and measures have been in flux since the birth of the World Wide Web, but the Internet Advertising Bureau is attempting to create media measures everyone can agree upon.

Why collect Web statistics?

The three most common motivations for analyzing Web traffic are business development, increasing marketing and advertising sales, and technical resource and capacity planning:

Business development: Knowing who is accessing what content--as well as why, how, and where--goes a long way toward helping you develop better content and expand your audience. For example, a business strategist at a software mail-order company might assume that because sales of business software titles exceed sales of entertainment titles in the company's printed catalog, the same will be true of online sales. But traffic analysis might reveal that most of the visits to the Web site are made by teenagers looking for new computer games. That knowledge might lead the business strategist to reallocate resources to the entertainment software section of the site.

Marketing and advertising sales: Accurate Web stats are vital to maximizing revenue. If you can show that the gardening section of your site gets 10,000 impressions a day, it'll be much easier to convince the vegetable-seed company to advertise within that section. Similarly, if you know that your site is popular with vegetable gardeners but not with flower gardeners, then you can market your site to the appropriate niche audience.

Technical resource and capacity planning: You need to know how much traffic your site generates in order to justify the cost of equipment and services--servers and hardware, routers, software, and bandwidth. You also need a handle on traffic numbers to support hiring and the outsourced services needed to develop and maintain the technical architecture of your site.

Individuals and Groups

Knowing how many hits, page views, and unique users your site attracts is essential. But you also need to know about the people who visit your site. What are your visitors' occupations, hobbies, and objectives in coming?

The standard way to find out about users is to ask them to register and then store their information in a database. Netscape, Microsoft, and other industry leaders are working on ways to incorporate user profiles into browsers so that individual user data will be automatically available to the Web sites they visit. This is intended to reduce the need for users to fill out forms on each site they visit, but there's no guarantee that users will fill out the profiles within their browsers in the first place.

Knowing about individual users is important, but so is learning about your audience as a group. What are their age and income ranges? What are their occupations? Where do they live? What kind of computer equipment do they use? Are they coming to your site from home? From school? From the office? And, perhaps most importantly, what do they want when they visit your site?

Gathering this information lets you develop, expand, and refine your content. If you discover that your audience is very different from what you originally anticipated, you can adjust your site's content, design, style, product line, and features to recapture the audience you originally sought or to satisfy the audience you've actually attracted. You can convince potential advertisers that your users are people who will be willing and able to buy their products. The more information you can supply--raw data, specific trends, and detailed facts--the more you can advertise to vertical markets.

Log file analysis tools

The first log file analysis tools were homegrown solutions to particular problems. Since then, a number of small operations have developed public domain and shareware tools. One that's proven its worth is wwwstat, originally developed by Roy Fielding and later revised by Chris Lehr into cjlstat. This log analysis tool, like many others, was written in Perl and works best in a Unix environment.

Many companies now use proprietary solutions based on public domain tools like wwwstat. It might take a good programmer six months to modify wwwstat into a customized solution. The biggest advantage of this approach is that, besides the cost of the programmer's time, it's free. Miller Freeman, for instance, a large publishing company that maintains 180 different Web sites and supplies 60 report formats, uses an in-house adaptation of wwwstat in combination with other homegrown or low-profile tools. This custom solution offers views of the top 10 percent of files accessed and the top 10 "hot" files, and breaks down page-views by day, time, and domain.

Of course, commercial tools are also available, complete with basic technical and customer support, and provide a number of graphical report formats. One early contender was net.Genesis, and similar products include Aquas's Bazaar Analyzer and WebTrends (formerly e.g. Software). (For a more complete list of products, see our chart of Web traffic analysis tools.)

Today, some tools come bundled with Web server software, such as Netscape's flexible logging system or Microsoft's recently acquired Intersay.

Dynamic analysis tools

Tracking Web traffic poses an essential dilemma: the more people using your site, the more data you have to track. Simple log file analysis is based on a long flat-file approach. Basically, each hit is equal to a row in a table. The more hits your site gets, the bigger the log file gets, which can slow data retrieval.

Several traffic analysis solutions attempt to address this problem. Market researcher AberdeenGroup contends that Andromedia's Aria is the current market leader because it does not rely on growing volumes of log files to track data. Instead, Aria conducts non-log-based, dynamic analysis in real time using HTTP components and C++ objects for each user.

However, Ted Julian of market research house IDC predicts competing products will arrive soon. Features to look for as these new log-file-based solutions emerge include the ability to group data together and analyze it as metadata. These tools should also be able to track individual Web surfers using a variety of methods, including site registration, digital certificates, and cookies. Julian also expects to see more tools that work at the network level, not just at the application level (for instance, to track when a user's session times out).

Traffic analysis checklist

If you're looking for a Web traffic analysis tool, here's what to consider.

Features available now Real-time report access. You should be able to get data concerning the last 10 minutes of your Web site activity immediately and view it in several useful graphical formats. (This feature is available in Accrue Insight and Andromedia Aria.)

Data drill-down. How many different dimensions and layers of your data can you view? What units of time can you examine (yearly, quarterly, monthly, weekly, daily, hourly)? It should be no problem to combine different types of data and query the database for just what you need. (See our chart for products and database integration.)

Flexible formats. In addition to supplying a large number of ready-made report formats, the product should let you easily create your own and view them in standard settings (spreadsheet, database, or Web page).

Scalability. How large a site, and how many sites, can the product handle? Some entry-level tools balk at sites turning more than 100,000 pages a day. Traffic of 100,000 to 1,000,000 pages constitutes a midrange site; if you turn more than a million pages a day, you'll need a large-scale solution. (Accrue Insight and Andromedia Aria offer premium-priced large-scale solutions.)

Ease of use. The best tools are easy to use without sacrificing access to information.

Features to expect soon The ability to track several sites at once. If your organization hosts multiple Web sites, you'll want not only analysis of the data for each site, but a meta-analysis of all your sites. Knowing which sites have more traffic at certain times, for instance, can dramatically change marketing and IT strategies.

Data synthesis. Theoretically, there's no reason why you shouldn't be able to combine your Web traffic data with customer, sales, and business information. In other words, you should be able to analyze Web and non-Web information together. For example, being able to extract existing customers from a list of site visitors who have accessed information about a recently released product could let your sales staff offer those customers a special deal.

Everything in one package. Many products offer one or more of the features mentioned here, but no product has them all. The ideal product or service would supply all of these features and allow you to pay only for the features you need.

Top traffic analysis tools

Many companies offer Web traffic analysis tools, but only a few enjoy a significant customer base. Here's a list of major players in general Web traffic analysis.

Accrue Insight: Insight 1.1: Cost based on individual configuration. By monitoring live network transactions and log files, high-end Accrue Insight provides specific data, including how many pages are actually delivered to users (rather than just sent by the server), how often users stop their browsers in the middle of a page download, and individual users' effective Internet connection speed.

Andromedia Aria: Aria 2.0: $9,895 for the basic package. Aria, currently the most complex and robust traffic analysis tool on the market, is for serious Web business developers. Its three components include the monitor (a server API plug-in that gets HTTP requests and is assigned to a Web port), the recorder (a multithreaded, persistent object-based dynamic server), and the reporter (which produces graphical reports of the results).

Marketwave Hit List: Hit List Enterprise 3.5: $7,000; Hit List Live 3.5: $16,000. Hit List provides detailed insight into visitors and their visits. Features include linking data such as IP addresses to corresponding records in ODBC-compliant databases.

net.Genesis net.Analysis: net.Analysis Pro NT 3.1: $2,500; net.Analysis Pro UX 3.0: $6,500. net.Analysis continues to offer useful features such as detailed data analysis, ready-to-go report formats, and customizable reports and filters.

WebTrends: WebTrends Log Analyzer 4.0: $300; WebTrends Professional Suite 1.0: $500; WebTrends Enterprise Suite 1.0: $1,500. WebTrends offers low prices compared to other market leaders. The Log Analyzer is a basic tool for marketing or sales managers. The Professional Suite has a proxy server analysis cartridge to conduct reverse lookups on Web servers and a Web quality control and link analysis tool to prevent broken links. The Enterprise solution lets customers connect log files to a database. They're all good choices to augment in-house applications.

Advertising analysis tools

One important reason to analyze Web traffic is to acquire information, both for your own sales and marketing staff and for outside advertisers. Ad traffic analysis applications usually come as part of a complete package that provides tools for managing and distributing ads, combining traffic information with other types of sales information. The two companies that have emerged as leaders in the advertising analysis tools market are NetGravity and Accipiter.

NetGravity AdServer family: AdServer: One-time baseline fee of $25,000; AdServer Network: One-time baseline fee of $100,000. AdServer is intended for single-site management; AdServer Network is for managing a group of sites. Although it's relatively difficult to implement, AdServer is easy to use once the system is in place, and NetGravity provides 24-hour support and an implementation consultant. Smaller sites can outsource to NetGravity's AdCenter service bureau to manage ad traffic.

Accipiter AdManager: Perpetual license starts at $17,000; subscription license starts at $7,000 plus $4,000 per quarter. Accipiter provides consulting, training, courseware, and 24-hour support. AdManager features detailed data-mining, a focus on individual users, real-time ODBC-compliant database interaction, and graphical reports. Accipiter's partners include I/PRO, ABC Interactive, RelevantKnowledge, and MBInteractive. Smaller sites can outsource to Accipiter's AdBureau service to manage ad traffic. (Note: CNET uses Accipiter's AdManager.)

Third-party auditors

Web traffic analysis tools and services give you everything you need, with one exception: a guarantee to customers, advertisers, sponsors, and investors that your data is accurate. Advertisers in particular want to be assured that they are receiving the Web count they've been promised. For that you need a third-party auditing service.

Auditing services check your Web site traffic data--and usually compile their own. A good audit verifies the accuracy of the data within a site's log file. Currently, there are three major auditing services:

Audit Bureau of Circulations (ABC): Audit Bureau of Verification Services (ABVS). (fee based on service provided). ABC traditionally audited the circulation of print publications. It formed ABVS to verify Web ad traffic. ABVS works closely with NetGravity to give customers both advertising management and tracking services and independent verification of ad traffic data.

BPA International (fee based on service provided). BPA is an independent auditing company that provides a convenient way of setting up an audit (with the Start Me Up page), as well as an informative FAQ about Web site auditing. Its customers include CDnow, Computer Weekly, Four11, the Internet Movie Database, and Upside.com.

I/PRO NetLine and I/Audit: NetLine: Reports (up to 250,000 hits a day, 12 standard reports) start at $750 per month. I/PRO's data-counting service, NetLine, does everything that a general traffic analysis tool can, without requiring you to set up equipment. NetLine processes traffic data and then creates and delivers reports. NetLine customers include Marriott, NBC, Dow Jones, Hotmail, and USA Today.

I/PRO's auditing service, I/AUDIT, aims to deliver accurate, unbiased numbers to advertisers. I/AUDIT uses a variety of methods, including regularly requesting log files from customer servers, testing for doctored files or data, random site visits by software robots, physically examining report data, and validating URLs.

(Note: Although I/PRO is the most visible auditing service, critics say that it is a conflict of interest for I/PRO to both analyze and verify data.)

Web ratings services

Web ratings services provide Web builders and advertisers with crucial information about a site's audience, sometimes comparing it with the audiences of other sites. These syndicated services are trying to establish themselves as the Web's answer to television and radio audience measurement service A. C. Nielsen.

Interestingly, Nielsen is coming late to the Web ratings game, but says it will begin a pair of new Web-based services this year. One service will measure Web content at what Nielsen claims is a "new level" of data analysis; the other service will measure audience ratings for individual Web ads. Additionally, Nielsen has several alliances with other companies in the field, most notably I/PRO.

The three current powerhouses in Web ratings are Media Metrix, MBInteractive (a division of Millward Brown), and RelevantKnowledge.

Media Metrix uses the PC Meter to measure every aspect of how people use their computers at home. The PC Meter is software that tracks users' click-by-click, minute-by-minute usage of software applications, including online services and Web browsers. While the PC Meter's focus extends beyond the Web, it provides a detailed tabulation of page-level viewing. The software is available for Windows machines only, and users must save their data onto a floppy disk and snail-mail it back to Media Metrix.

MBInteractive focuses on determining who's visiting your site, why they're there, and what they expect to find. The company employs user profiling (sometimes referred to as Web site evaluation), enhanced ad reporting, and a consumer network. To profile users, MBInteractive gives a ten-minute survey to a random sample of Web users. For enhanced ad reporting, MBInteractive works with Accipiter to track the effectiveness of specific ads. The MBInteractive Consumer Network is a panel of 10,000 U.S.-based Web users intended to track public reaction to online ads.

RelevantKnowledge's methodology resembles TV and radio ratings systems. RelevantKnowledge's Panel2000 comprises 8,000 Web users intended to represent the age (12 and up), gender, occupation, location, points of access, computer platforms, and browsers of the "Web universe." Panel members download software that automatically sends their Web surfing clickstreams to RelevantKnowledge. RelevantKnowledge delivers monthly reports to clients over the Web through a Java applet.

About the author

Mariva H. Aviram develops Web sites, consults, and writes.