Skip navigation.
Home

Web Analytics Association: Standard Metrics Definitions

(UPDATE: Having woes with the Captcha. Argh. Have disabled comments till I can get it fixed. Sorry.... Thanks Judah for the heads up!)
(UPDATE2: Captcha Woes Fixed. Now using a new improved "Math" Captcha. Apologies for the mess!)
(UPDATE3: Pulled item 20. Stephen Turner gently corrected me that timestamps are daylight savings independent. I should know that! Smile)

The Web Analytics Association (WAA) has recently released a document of 26 Standard Definitions (PDF) to "... Promote Consistency across the ... Web Analytics Community"

They are by no means a complete list, but you can read various reactions around the Web Analytics industry via Avinash, Judah and/or Robbin. All have a slightly different view on things. And to the best of my heresay knowledge, are all members of the WAA itself. I'm personally not a member, more due to slack/lazy inertia than a deliberate conscious decision to not join.

I finally had some time today to read the document in detail. As is natural with any new document, there are issues, minor or not. So being a good little bug reporter type, I thought I'd write 'em down. And then email the list through. But the list seemed to grow just a tad, and I figured that perhaps these concerns and issues could benefit from a public airing. Or at least, that I could be shot down publicly... Foot in mouth Foot-in-mouth, just deserts et al. Pick one. Or three. Smile

Notwithstanding the below (hopefully constructive) criticisms, this is a pretty good document! Really! It's way past time we did have a standard on what these terms mean in this industry.

Now I have had all too much experience with writing policy documents, was even Defence's representative on a Standards Australia committee. (Which sounds way more glamorous than it is. What's that? It doesn't sound glamorous? That's what I just said! Tongue out). So please forgive the anal retentiveness of what follows. I do mean this for the best! Laughing No particular order, though I have tended to start from the beginning of the document and gradually worked through to the end.

  1. Copyright: As I raised in comments on Avinash's blog, there is currently a pretty big question mark over the distribution of this document, or portions thereof. Based on feedback I've received, it was a simple oversight and certainly not grounds for criticism itself! It just complicates things. Currently.
    To put in context: I have two Open Source projects that could, would and should use these definitions. I personally use both for doing real Web Analytics, both @home, and @work. I would like to use these definitions within those products, but currently I cannot. Or rather I would have a silly situation of having to refer people to a remote PDF to explain the terms I use within the man pages and so on. Sealed
  2. Introduction: It is not clear within the document as to WHY the document exists at all. This makes it a little hard to truly be fair in any criticism, as the actual goal is unknown.
  3. Target Readership: It is not clear who is supposed to care about the definitions contained therein. Those who develop and design WA tools? WA Analysts? CIO's? CEO's?
  4. Assumption: I fully support the removal of Robots et al from the metrics, no argument here! Problem: What is a definition of a robot for me to remove them? I frequently use tools (eg wget) as a primary web browser, that others would call a robot, or that can be used in a robot like fashion.
  5. Missing Definition(s): With the removal of 'bots etc there is no defined way remaining to describe entire site usage. Humans Plus Robots. This is highly necessary from a systems perspective where one WILL (Should!) be doing capacity planning, disaster planning and the like.
  6. Stylistic: Use of defined terms eg "Pages" in the document should be highlighted, such that they are shown to refer to defined terms. "Pages" has a specific meaning in terms of a book. As this document shows, it has an alternate meaning. Highlighting displays a nice clue that the term has a specific meaning in this context. eg. "The new visitor metric,..." Is this "the NEW <pause> visitor metric"? Or "the NEW_VISITOR metric"? Well I know which one is meant Smile , new_visitor, but it will not be clear to the casual reader.
  7. Stylistic: Consistency. Again on "Page". This is referred to in Page Views as "a page (an analyst-definable unit of content)" but the same qualifier is not used elsewhere. The afore mentioned highlighting may help solve this?
  8. References: Page Views refers to HTTP status codes. It would be strongly advisable to reference the appropriate Internet RFC(s) where these codes are strictly defined. eg RFC2616 for HTTP1.1. It avoids issues of clarity and tightly couples to already accepted standards.
  9. Accessibility and Data Collection: The document appears to have an inbuilt bias towards certain data collection methodologies: eg. "As an alternative, image based page tags can be placed inside such content to track...". This has issues when used in conjunction with certain other technologies, such as those by vision impaired persons, or those who deliberately choose to use more "primitive" tools. In essence, is this a document for everyone who uses the Internet, or just those who conform to the majority? I hasten to add that neither position is right or wrong per-se, just different.
    This could, perhaps, be handled by another assumption statement? The danger of too many broad stroke assumptions being that the definitions could date very quickly, or be wildly incorrect for certain market segments.
  10. Missed Opportunity: "A typical time-out period for a visit is 30 minutes,...". It would have been preferable to state that Visits/Sessions ARE timed out after 30 minutes, BUT that an analyst may derive additional insights by alerting the time-out period. Such a time-out change should be referred to as (eg) " X Visits (10Min T/O)"... or similar.
  11. Loose Specification: Again on Visits, 30 minutes should be more tightly defined. See below on Visit Duration.
  12. Correctness: Is their scope to be specific in phrasing? eg Internet RFC's define key words like "MUST", "SHOULD" and so on.
  13. Accuracy/Correctness: "the most predominant method of identifying unique visitors is via a persistent cookie". Predominant by whom. Laughing I submit that there are easily far more web sites that use IP Addresses alone to identify Visitors than there are those that use Cookies. Perhaps "preferred" would be a better choice?
  14. Long Term Relevance?: The document seems very much rooted in the here'n'now. Doesn't take into consideration re-analysis of old data - eg pre-cookies, and seems to ignore any future changes - whatever they could be. That may be hard, but some parts could be more generalised.
  15. Technology Specific: eg The references to Page Tagging and Logging. But ignores other data collection methods. I would prefer a more general document that I could use irrespective of my collection methodology. eg Panels, Sniffing (variant on logging?), Server Side capture etc. etc. etc.
  16. Confusion: "Repeat Visitor" and "Return Visitor". Both are 6 letter words starting with 'R'. This is just asking for confusion, especially as both "Repeat" and "Return" have similar meanings. eg. I'd hate to see how this translates into another language. I would strongly suggest that an alternate word is chosen for one or both of these metrics. Unfortunately I have no suggestion to offer.
  17. Clarity: "Entry page should not be equated or confused with landing page." Sure! Why? The explanation is on the next page, I would argue it should be on both pages, but from the context of (a) why an Entry Page is not a Landing Page and (b) Vice-Versa.
  18. Examples: Particularly with Landing vs Entry, case study examples would help improve understanding and further help define the meaning.
  19. Technology Specific: Exit Page - "The use of cookies to track visit sessions or another reliable visit session method is necessary to accurately track this measure." Again forces to a specific technology as defining a standard. It might be preferred to use Cookies or similar but it is not NECESSARY. In essence what this states is that any tool that provides "Exit Page", without cookies, is suddenly and magically wrong. I would argue, using current technology, that the use of client side Javascript would be generally preferable to cookies as a method of more accurately tracking the Exit Page. The appropriate script may use cookies, but Javascript would be the driver.
  20. Accuracy/Correctness: Visit Duration: "Calculation is typically the timestamp of the last activity in the session minus the timestamp of the first activity of the session." This is simply wrong. Worse, is in conflict with the label: "Duration". The calculation as written is the delta between two timestamps. It can be positive OR negative. Think daylight savings changeovers, and how they would impact on such a simple calculation. That most of us, including me!, do it this way doesn't make it right. Embarrassed
  21. Clarity: Again on Visit Duration: Should the Duration be reported as Zero? Or Unknown? Or??? Would be an ideal place to define what SHOULD go here.
  22. Clarity: Referrer: "current page view or object." Sure. What's an object? Obviously for me, an object is a superset of Pages, but where is this spelt out? If it is a superset, is the use of "Pages View" redundant? This definition itself could be tightened.
  23. Loose Specification: "These are often reported as "No Referrer" or "Direct Navigation"." Which is to be the new standard? eg These SHOULD be reported as "No Referrer". Non compliant tools may use "Direct Navigation" or "-".
  24. Inconsistency: Internal/External Referrers. Both have the same problem: "... as defined by the user.". Um. No. As defined by the Analyst, as has been used earlier. Others may differ, but to me "User" speaks as the Visits/Visitors we're tracking. Not the person doing the analysis.
  25. Examples: Again, for Internal/External, I would advise the use of examples to highlight the wording. Diagrams even. Show how different domains could be considered to be Internal and so on.
  26. Confusion: "Visit Referrer" and "Original Referrer" appear to be much the same? I gather these are tool specific, and hence obvious to practitioners of that Tool(s)? Smile
  27. Unnecessary Definitions: "Click-through" and "Click-through Rate/Ratio" appear to be a fancy way of saying "Page" or "Referrer" respectively, in the two definitions. And hence are unnecessary. Some pages are more equal than others, perhaps? Wink
  28. Unnecessary Definition: "Page Views per Visit". This is already defined, and hence redundant. If we have already defined "Pages" and defined "Visits" then the ratio logically follows. Consistency also. Don't use "Page" originally and suddenly switch to "Page Views". It's either "Page" or we have a new Object called "Page Views" - which is missing from the definition. Ignore: Steve is a wally.
  29. Clarity: All the ratio's et al presented in "Content Charactization" should be explicitly described/drawn as a mathematical formulae. Common Symbols/descriptions should be used throughout.
  30. Clarity: "Single-Page Visits" and "Single Page View Visits" are explicitly mentioned as being different "not to be confused", but there is no obvious difference from the descriptions?
  31. I'm personally not convinced that any of the "Content Charactization" Terms should be included in this document, mainly as they are all derivatives of earlier Terms. Perhaps these would be better placed in a separate document, or as an Appendix like chapter?
  32. Loose Specification: "Event" seems to be referring to Significant Things Happening. But the definition is for "Hit" in the common and technical parlance from, eg. server logs. It feels like this definition started out being more specific, but in trying to be all things to everyone has become too loose?
  33. Loose Specification: Conversion. Similar to Event this definition feels like it's lost something. eg. Who defines a "Target Action"? What is a "Target Action"? How do I know if I have one?
  34. Consistency: Again in Conversion: "They provide the marketer an additional tool...". Analyst vs Marketer, perhaps?
  35. MIA: "Hits" simply screams out by its absence. I would far rather prefer to have it introduced as "This is what it means, this is why it's not used in Analytics." Else one of the most commonly used and abused terms is still with us. It needs to be forcibly explained into obsolescence. And also, explained as to where it is useful. For Hits does have real business use left in it - eg. The afore mentioned Capacity Planning and the like.
    I did enjoy the irony in reading the definition for "Event" and noting that "Hits" was no longer with us. But then... I'm evil. Innocent

All of this discussion is based on the 20070816 Document.

So that's my first run through. I've not been as anal as I would in work-life with a physical copy which would be covered in cryptic red scrawl. eg Picking on each and every infraction. And do note that, generally, I have no qualms about the definitions themselves.

Yet again, my profound thanks to all the folk who have contributed to this document. I sincerely do appreciate how hard something like this can be. And have been on the receiving end of the red ink heaps of times myself. So consider me fore-punished for doing it to someone else! Laughing

 

Syndicate content