User Guide

Urchin 5.000 Urchin Administration/User Guide © Copyright 2003 Urchin Software Corporation. All rights reserved. Printed Date: 01/25/2005 03:03:02 Mo...
Author: Alexina Palmer
2 downloads 2 Views 786KB Size
Urchin 5.000 Urchin Administration/User Guide

© Copyright 2003 Urchin Software Corporation. All rights reserved. Printed Date: 01/25/2005 03:03:02 Modified Date: 01/23/2005 11:41:26

Table of Contents Chapter 1: Getting Started................................................................................................................................1 Welcome to Urchin!.................................................................................................................................1 System Requirements..............................................................................................................................2 Supported Platforms and Hardware Requirements...........................................................................2 Urchin Setup Requirements...............................................................................................................4 Installation...............................................................................................................................................6 Quickstart Installation Guide.............................................................................................................6 Installation Guide (Windows).........................................................................................................10 Installation Guide (UNIX)...............................................................................................................11 Installation Guide (Mac OS X 10.2.x)............................................................................................14 Installation Guide (Sun Cobalt).......................................................................................................15 Uninstalling Urchin 5......................................................................................................................18 Troubleshooting Install Problems...................................................................................................20 Upgrades................................................................................................................................................20 Upgrading Urchin 4.........................................................................................................................20 Upgrading Urchin 3.........................................................................................................................23 Upgrading Urchin 3 on Sun Cobalt.................................................................................................24 Urchin 3, 4, &5 Reporting Differences...........................................................................................26 Upgrading Urchin 5.........................................................................................................................28 Initial Configuration..............................................................................................................................30 E−commerce Reporting...................................................................................................................30 Setup Recommendations.................................................................................................................32 Chapter 2: Visitor Tracking............................................................................................................................34 Using UTM with E−commerce..............................................................................................................34 Visitor Identification Methods...............................................................................................................35 Urchin Traffic Monitor (UTM)..............................................................................................................38 Session−ID Identification......................................................................................................................45 UTM Quick−Install (Apache)................................................................................................................46 Installing UTM On Every Page (Apache).............................................................................................47 UTM Quick−Install (IIS).......................................................................................................................48 Using UTM with Domain Aliases.........................................................................................................49 Using UTM with Multiple Sites............................................................................................................50 Tracking Flash and Browser Events (UTM−5 only).............................................................................51 Tracking Banner Ad Exits and Other Outbound Links.........................................................................53 Chapter 3: Urchin Administration..................................................................................................................54 Administration Overview.......................................................................................................................54 Profiles...................................................................................................................................................56 Importing Profiles (Windows).........................................................................................................56 Working with Profiles.....................................................................................................................57 Log Files................................................................................................................................................59 Working with Log Sources..............................................................................................................59 Log Management.............................................................................................................................60 Log Rotation Best Practices............................................................................................................61 Logging − Apache and IIS..............................................................................................................64 Logging − iPlanet............................................................................................................................67 Logging: Tomcat (Apache Jakarta Project).....................................................................................67 Logging − Other Webservers..........................................................................................................68 Wildcard &Date Substitution in Log Path......................................................................................69 Processing Historical Logs..............................................................................................................72 i

Table of Contents Chapter 3: Urchin Administration Log Reprocessing............................................................................................................................73 Filtering..................................................................................................................................................74 Filtering Overview..........................................................................................................................74 Filter Fields......................................................................................................................................76 Exclude/Include Filters....................................................................................................................81 Decode URL Filters.........................................................................................................................82 Search &Replace.............................................................................................................................83 Lookup Table Filters.......................................................................................................................84 Advanced Filters..............................................................................................................................85 DynamicURL Filters (deprecated)..................................................................................................86 Regular Expression Overview.........................................................................................................88 Affiliations, Users &Groups..................................................................................................................89 Working with Affiliations...............................................................................................................89 Working with Users &Groups.........................................................................................................90 Scheduling Tasks...................................................................................................................................92 Working with the Task Scheduler...................................................................................................92 System Settings......................................................................................................................................94 Changing the Port Number..............................................................................................................94 Licensing Urchin.............................................................................................................................94 DNS Database Update.....................................................................................................................96 Chapter 4: Reporting Interface.......................................................................................................................97 Report−Side Filtering............................................................................................................................97 Reporting Interface Overview................................................................................................................97 Exporting Data.......................................................................................................................................99 Date Range.............................................................................................................................................99 Chapter 5: E−commerce Module..................................................................................................................102 E−commerce Overview.......................................................................................................................102 ELF &ELF2 Log Formats....................................................................................................................104 Custom E−commerce Logs..................................................................................................................107 Visitor Correlation...............................................................................................................................112 Cancelling E−commerce Transactions................................................................................................113 Chapter 6: Campaign Tracking Module......................................................................................................114 Campaign Tracking Overview.............................................................................................................114 The Five Dimensions of Campaign Tracking......................................................................................117 Step 1: Track Campaign Data (Set up UTM−3)..................................................................................118 Step 2: Install and License Campaign Tracking..................................................................................120 Step 3: Define a Conversion Goal.......................................................................................................120 Tagging Your Online Links 1−2−3.....................................................................................................122 Import Cost Data from Google............................................................................................................123 Import Cost Data from Overture..........................................................................................................126 Adding Cost and Impression Data.......................................................................................................128 How To Analyze Keyword Buying.....................................................................................................129 How To Track Content−Targeted Ads................................................................................................134 How To Track Email Campaigns........................................................................................................136 How To Use Master Tracking Codes...................................................................................................138 URL Builder........................................................................................................................................139 Implementation Checklist....................................................................................................................141 ii

Table of Contents Chapter 7: Advanced Topics..........................................................................................................................143 Utilities.................................................................................................................................................143 Administration Utilities Overview................................................................................................143 geo−update: DNS Database Update Utility..................................................................................145 inspector: Urchin Installation Integrity Checker...........................................................................147 u3importer: Urchin 3 Data Import Utility.....................................................................................148 uconf−driver: Configuration Management Utility........................................................................151 uconf−export: Text−based Configuration Export Utility..............................................................162 uconf−import: Text−based Configuration Import Utility.............................................................164 uconf−schedule: Global Scheduling Utility..................................................................................167 udb−sanitizer: Database Maintenance Utility...............................................................................168 urchinctl: Urchin Services Control Utility....................................................................................171 urchin: Urchin Log Processing Engine.........................................................................................172 Integration............................................................................................................................................173 NFS locking requirement..............................................................................................................173 Overview of Urchin Integration Capabilities................................................................................173 Changing the Location of the Urchin Data Directory...................................................................175 Using an Existing Apache Webserver (UNIX−type Platforms)...................................................177 Using an Existing IIS Webserver (Windows Platforms)...............................................................179 Using External Authentication or Authentication Bypass............................................................181 Linking Directly to Urchin Reports...............................................................................................183 Script−based Configuration Management Overview....................................................................186 Data Export....................................................................................................................................189 Customization......................................................................................................................................190 Custom Log Formats.....................................................................................................................190 Custom Navigation........................................................................................................................192 Custom Reports.............................................................................................................................194 Custom Date/Time Formats..........................................................................................................196 Custom DNS Entries.....................................................................................................................197 Custom Lookup Tables.................................................................................................................198 Cobranding Urchin........................................................................................................................200 Hosting Automation Solutions.............................................................................................................201 How are H−Sphere and Urchin 5 Integrated?...............................................................................201 Using Urchin with Plesk PSA 5.0.................................................................................................201 Ensim Webppliance.......................................................................................................................202 Sphera's HostingDirector...............................................................................................................203 Performance &Tuning.........................................................................................................................203 Global Filtering of Hits from Monitoring Software......................................................................203 Reducing Disk Storage for Urchin Profile Monthly Databases....................................................204 Security Features..................................................................................................................................207 Activating SSL on the Urchin Webserver.....................................................................................207 Chapter 8: Reference......................................................................................................................................208 Integer Field List..................................................................................................................................208 Regular Field List................................................................................................................................209 Regular Report List..............................................................................................................................213 Configuration Table and Directive List...............................................................................................217 Error code list for failed FTP and HTTP remote webserver log transfers...........................................225

iii

Chapter 1: Getting Started

Welcome to Urchin!

Urchin 5 represents 7 years of development, and is in our view the most advanced web analytics package available today. Combining proven datacenter−class performance with unprecedented ease−of−use, Urchin 5 is the best choice for businesses and hosting providers of all sizes. What is Urchin? Urchin is a web analytics system designed to enable businesses to easily analyze the traffic to their website(s) and create detailed, insightful, and intuitive reports. Basically, Urchin is a log−analysis program, but its sophisticated unique visitor reporting goes far beyond what was available up until now. Chapter 1: Getting Started

1

How Does Urchin Work? Urchin consists of 4 primary components: • The Admin Server • The Log−processing and DNS resolution engine • The Visitor Interaction Data Architecture (VIDA) database • The Scheduler The Admin Server is Urchin's nerve center. It is a web−based control panel system, powered by a customized Apache web server, that controls all the other Urchin components. With the Admin Server, you can access and control the Urchin system from any computer on the Internet (by turning on remote access and reporting). The log−processing and DNS resolution engine does the heavy lifting in the Urchin system, coverting large raw log files into meaningful data, translating IP addresses to domains, and entering that information into the VIDA database. The VIDA system is our highly−specialized, optimized, proprietary database for quickly entering and extracting web analytics data. This analytics−specific database is a significant part of Urchin's speed advantage over the competition. The Scheduler regularly checks the configuration database for scheduled tasks that need to be run, and executes Urchin to process them at their scheduled times. Who should use Urchin? Urchin is ideal for any individual or business who has access to their website's log file(s) and HTML. If you do not have access to your site's log file(s), ask your hosting provider to install Urchin. It is very popular among hosts. Contact [email protected].

System Requirements Supported Platforms and Hardware Requirements

Urchin runs on numerous architectures and operating systems. An Urchin installation is only needed on a system that will be processing logs. For viewing reports, only a web browser is required. Supported Platforms Chapter 1: Getting Started

2

Windows • Windows 2003 Server • Windows XP • Windows 2000 (Professional and Server) • Windows NT 4.x UNIX−type Systems • Mac OS X (10.1 and higher) • Mac OS X Server (10.1 and higher) • Linux x86 ♦ RedHat Enterprise 3.0, RedHat 9, RedHat 8, RedHat 7.x, RedHat 6.x ♦ Fedora Core 2, Fedora Core 1 ♦ SuSE 9 ♦ Other Linux OSes should be compatible; see the list in the Non−Explicitly Supported Platforms section • FreeBSD 5.2, FreeBSD 4.x • Solaris 2.6, 7, 8 , 9 (SPARC) • Solaris 9 (x86) • Sun Cobalt RaQ550, Qube3, RaQ4, RaQ3 Anticipated OS Support The following OSes should have a native build of Urchin released in the timeframe noted for each one: • FreeBSD 5.3 − first quarter 2005 • Solaris 10 − first quarter 2005 If you don't see your OS listed, and a substitute cannot be found in the compatibility list in the next section, contact us to suggest it as a possible inclusion. Non−Explicitly Supported Platforms We strive to make Urchin available natively on as many platforms as is economically reasonable. If there is no specific Urchin distribution for your platform, you may find an available Urchin distribution that is compatible with your OS as explained below. Windows 98, Windows 3.x: Urchin cannot be installed on Windows 98 or 3.x, but these platforms can be used to view reports with Internet Explorer 4.x and newer. Linux: There are many different variants of Linux and we don't build an individual Urchin distribution for all of them. However, there is typically a high degree of compatibility across Linux flavors so one of our distributions almost certainly will work on your machine. Some known compatible distributions are: Chapter 1: Getting Started

3

RedHat Enterprise Linux 2.1: use the RedHat 7.2 distribution of Urchin SuSE Linux 8: use the RedHat 7.2 distribution of Urchin For all other x86−based Linux variants you can determine which Urchin distribution to use by looking at our FAQ article on this topic. Solaris: • For SPARC systems, any OS release prior to Solaris 2.6 is not supported. • For x86 systems, any OS release prior to Solaris 9x86 is not supported. Urchin 5 System Requirements Urchin's superior performance allows you to get more from less hardware investment. For instance, an older Pentium II might be too slow for desktop use, but will make a fine Urchin server. And Urchin's unmatched portability means you can use whichever operating system you like. Below, we provide a recommended level of hardware for high performance. Recommended Systems Single Small to Medium Website Analysis • 500mhz or better processor • 128mb RAM • 10GB+ IDE hard disk • Ethernet interface Service Provider / Enterprise Installations • 1Ghz Pentium IV / 500mhz UltraSPARC / similar mhz range PPC/MIPS/etc. • 256mb RAM • Ultra2/Wide SCSI hard disk (such as a Seagate Cheetah) • 100base−T ethernet • Backup system Memory/System/Disk Usage • Urchin Memory(RAM) usage can be configured to use between 20−500Mb • Urchin can be configured to run at low, normal or high priority • Urchin's data storage will use approximately 10% of the size of raw logs

Urchin Setup Requirements

Chapter 1: Getting Started

4

This article lists the operational issues that should be anticipated prior to installing and running Urchin. Some of the information is required to operate Urchin successfully. Other items are important for using Urchin most effectively once the software is installed. Basic Urchin Installation Considerations • On Windows you must install while logged in as the Administrator. • On UNIX−type systems you may install as any user, but if you do not install as the superuser, you will be restricted in what areas of the file system you may install. • Urchin comes bundled with an Apache webserver binary for configuration and report delivery. Your systems administrators should be aware that this new web service will be running after Urchin is installed. • Although the Urchin distribution itself is small, taking up only about 25 megabytes, you should install in a disk location that has plenty of room (e.g. several hundred megabytes at least) to allow for the growth of the Urchin databases over time. See the Performance and Management Issues section for additional considerations. • If you are upgrading from Urchin 3, you will need to import your databases into Urchin 5 using the u3importer utility. There is no direct upgrade of Urchin 3 to Urchin 5 simply by running the Urchin 5 installer. See Upgrades in the Getting Started section of the Documentation Center. • Upgrading from Urchin 3 or Urchin 4 to Urchin 5 requires relicensing your product. Basic Urchin Processing Considerations • Access to webserver logs − you must know the path to the log files for a given site, and you must have permission to access these files. If the logs are on a remote system, then you will also need an account name and password to use when retrieving the logs. • Properly configured log format − although Urchin can process custom log formats, you will simplify the management requirements if you configure your webserver as appropriate to log in a standard format. It is recommended that you use either Extended Combined Log Format (e.g. NCSA or Apache logs), or W3C Extended Log Format (e.g. IIS logs). For IIS sites, logging of Process Accounting should be turned off. See the Advanced Configuration section for additonal considerations. • Unique user account for Urchin processes − On UNIX−type systems it is desirable to enhance security by having Urchin programs run as a special user id that is used exclusively for Urchin and has only limited privileges. Setting up such an account will require that you have elevated or superuser privileges on the system in question. • Scheduling − you will need to choose a run schedule for Urchin processing to deliver reports in a timely fashion as well as account for the time needed to process if you have large data sets. Advanced Urchin Processing Considerations • If you desire Unique Visitor tracking then you will have to perform the following basic steps: ♦ Install the UTM sensor code in the web pages on your site ♦ Activate cookie logging in the log format for your webserver ♦ Set the tracking methodology in the Urchin Profile for the website to be UTM • If you choose not to use Unique Visitor tracking then you should consider what level of granularity you desire for visitor or session reporting, and select the appropriate alternative Visitor Tracking Method for each site. Besides UTM the choices are IP only, IP/User−Agent (the default), Session ID, or Username. Chapter 1: Getting Started

5

Performance and Management Issues • Log rotation − if you do not have some external mechanism for archiving or removing webserver logs after they have been processed by Urchin, you can configure Urchin to perform this task in the Advanced Settings for each Log Source. • Retaining past Urchin databases for historical reporting − once the databases for a given month are created they are available from then on for historical analysis. Users should consider how far back they need to keep historical data so they can plan for purging unnecessary data to save disk space. Urchin can be configured to compress databases that are older than a certain date. • Memory requirements − Urchin has configuration controls to limit the amount of RAM it utilizes when processing logs. The default is set to 20Mb, which may be too conservative for sites with logs greater than 10Mb in size. Plan to have sufficient system RAM so that you may increase Urchin memory usage as needed and tune the software's memory settings for maximum processing performance. • Location of Urchin data storage − utilizing the etc/urchin.conf file, Urchin can be configured so that the report databases are stored in a file system area outside the Urchin distribution. This allows you to allocate dedicated sufficient file system space for database growth where it's most convenient. Remote Access and Integration Issues • Using SSL for Urchin administration and reporting − the webserver that is bundled with Urchin is compiled with support for SSL. The configuration does not have SSL activated by default, however this can be turned on as desired by the user. • Firewall configuration − if your network topology includes firewalls, proxy servers, and other elements that will be in between the Urchin processing server and users trying to view reports or systems that hold logs that need to be retrieved, then those devices will have to be configured so that they don't interfere with Urchin's remote access. This typically can be done without subverting the security that such a topology is intended to provide.

Installation Quickstart Installation Guide

This Quickstart article is for first time installers of Urchin. If you have an existing installation, read the Upgrades section. When you have completed the installation steps, login to the Urchin administration interface to perform configuration. The initial username and password are:

Chapter 1: Getting Started

6

Username: admin Password: urchin Reset the password during your initial configuration in the Setup Wizard. If you require unique visitor and session tracking, complete the steps in this Quickstart Guide and continue with the UTM Quick−Install article in the Visitor Tracking section. Installing on Windows Systems • Go to www.urchin.com and click the Download link. • Download the Urchin for Windows installer to your desktop. • Once the download has completed, double−click the installer file to start the InstallShield® wizard. • Follow the on−screen instructions. The defaults should be acceptable for most installations. • Once the installer has completed, go to Start −> Programs −> Urchin −> Urchin Administration and login. You will get a License Urchin screen. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. • Once licensing is completed you will be presented with a Setup Wizard. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. • When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. • Once you have created the appropriate Profiles, you're ready to start processing logs so that you can view Report data for your sites. To access the administration interface remotely or for users to see their individual reports, use the URL http://yourhost:9999, where yourhost should be replaced by the name of the system where Urchin is installed. Installing on UNIX−type Systems • Go to www.urchin.com and click the Download link. • Select the installer for the OS type that most closely matches your platform. The name of the installer image will include the Urchin version and the operating system type (e.g. urchin5000_freebsd4x.sh, urchin5000_redhat9.tar.gz) • If necessary, upload the installer to a temporary location on the system on which you are installing Urchin. • If you are not on the system's console, telnet (or use ssh if available) to the system and cd to the directory where the installer is located. • Installers will have either a .sh or a .tar.gz suffix. Depending on the type of installer you will do one of the following: ♦ For a shell archive (e.g. urchin5000_freebsd4x.sh) simply type the name of the file like so: ./urchin5000_freebsd4x.sh. This will unpack several files that comprise the installation kit. ♦ For a tar.gz image (e.g. urchin5000_redhat9.tar.gz), uncompress and unpack the installation files with the commands: Chapter 1: Getting Started

7

gunzip urchin5000_redhat9.tar.gz tar xf urchin5000_redhat9.tar • From the command line execute the main installation script by typing: ./install.sh • The script will prompt you for input as needed; just follow the instructions. • When the installer has finished, you will be given the URL to access the Urchin administration interface, as well as the default admin password. • Copy/paste the URL into a browser window, and enter the admin username and password to start configuring Urchin. You will get a License Urchin screen. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. • Once licensing is completed you will be presented with a Setup Wizard. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. • When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. • Once you have created the appropriate Profiles, you're ready to start processing logs so that you can view Report data for your sites. To access the administration interface remotely or for users to see their individual reports, use the URL http://yourhost:9999, where yourhost should be replaced by the name of the system where Urchin is installed. Installing on Mac OS X 10.2.x Systems • Go to www.urchin.com and click the Download link • Download the Urchin installation archive for Mac OS X 10.2.x • If the installer is downloaded directly via a browser to the system where it will be installed, an Urchin 5 folder will automatically be created on the desktop. If downloaded via some other mechanism such as ftp, double−clicking the installation archive icon which will unpack the archive and create the desktop folder. • Open the Urchin 5 folder, and double−click the Urchin.mpkg file, which will launch an interactive installation process. It's required that you are using an account with administration privileges to install. • At the end of the installation a browser will launch and take you to the Urchin administration screen. You will get a License Urchin screen. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. • Once licensing is completed you will be presented with a Setup Wizard. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. • When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. • Once you have created the appropriate Profiles, you're ready to start processing logs so that you can view Report data for your sites. To access the administration interface remotely or for users to see their individual reports, use the URL http://yourhost:9999, where yourhost should be replaced by the name of the system where Urchin is installed. Installing on Sun Cobalt Systems

Chapter 1: Getting Started

8

• Use a web browser from your desktop system to connect to http://www.urchin.com/download/urchin5 • On the download page select the installer that most closely matches your platform. The name of the installer will include the Urchin version and the Sun Cobalt system type. For example: urchinc−5.0.00_cobalt_raq550.i386.pkg • Save the .pkg file to your desktop or to a temporary folder • Using your browser connect to the main Site Administrator's page for your Cobalt box • Navigate to the section of the interface for installing new third party software. The location of this area in the Cobalt interface will be platform specific: ♦ Raq 3, RaQ 4 − click on Maintenance in the left hand frame, then click Install Software in the top row ♦ Qube 3, RaQ 550 − click on the BlueLinQ tab, then click Third Party Software, then click the Install Manually button ♦ XTR − click on the BlueLinQ tab, then click New Software, then click the Install Manually button • Prepare and launch the package installer: ♦ RaQ 3, RaQ 4 − In the Software Package box select the Upload radio button, then click the Browse button to the right and navigate to the location on your desktop system where you saved the .pkg file you downloaded. Once you click the Open button in the browse window, the pkg filename will be entered into the Upload text box. Then click the "Install a pkg Package" button. When the installation is finished an Urchin link will appear in the lower box for installed software. ♦ Qube 3, XTR, RaQ 550 − In the Location box select the Upload radio button, then click the Browse button to the right and navigate to the location on your desktop system where you saved the .pkg file you downloaded. Once you click the Open button in the browse window, the pkg filename will be entered into the Upload text box. Then click the Prepare button. Once the package has been prepared, an Install Software window will appear. In this window click the Install button. When the installation is finished Urchin 5 will be listed under the Programs tab. • Click on the Urchin link in your Cobalt administration interface and the Urchin admininstration login window will appear. Enter the admin username and password to start configuring Urchin. You will get a License Urchin screen. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. • Once licensing is completed you will be presented with the Setup Wizard. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. • When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. • Once you have created the appropriate Profiles, you're ready to start processing logs so that you can view Report data for your sites. To access the administration interface remotely or for users to see their individual reports, use the URL http://yourhost:9999, where yourhost should be replaced by the name of the system where Urchin is installed.

Chapter 1: Getting Started

9

Installation Guide (Windows)

The installer is an executable which guides you through all the steps necessary to install Urchin. The basic components of the Urchin 5 installation process are: • Creating the distribution directory and unpacking the files • Installing and starting an Apache webserver as an NT service to allow web based configuration and report delivery • Installing and launching the Urchin task scheduler which manages log processing jobs as an NT service • Initial configuration and demo licensing of Urchin via the administration interface Installation Preparation You must be logged in as Administrator on the console of your system in order to install Urchin. By default the Urchin webserver service will use port 9999 when it launches. You will have the option of choosing a different port number during installation. Please verify that any port you choose does not conflict with existing operational services on your system. You will need access to the Internet from your machine. Internet access is required to complete the demo licensing and activate your Urchin distribution once it is installed. Installation Instructions If you are upgrading an existing installation of Urchin, please consult the Upgrades section of the Documentation Center for relevant details. Double click on the urchin5XXX_win_setup.exe (e.g. urchin5000_win_setup.exe) icon to launch the installer, and follow the instructions in the dialog screens. Initial Configuration Using the Administration Interface Once Urchin is installed you can connect to your Urchin administration interface by going to the Start Menu, and selecting Programs−>Urchin−>Urchin Administration. Alternatively, you can enter the direct URL http://localhost:port_number into your browser, where port_number is either 9999 or a number you may have chosen during the installation. Wen you initially connect to the configuration interface, you will be presented with a License Urchin wizard. You should click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. When finished with the license process, you will be returned to the Urchin administration interface, where you will be led through a Setup wizard that will set some required initial configuration parameters. Remote Access Configuration Chapter 1: Getting Started

10

If you connect to the Urchin configuration interface by using the hostname in the URL (e.g. http://yourhost:9999 instead of http://localhost:9999) the program will detect this as a remote access (even if you are on the console of the machine you're connecting to) and will prompt you for a username and password. The default settings for logging in with administration privileges are: Username: admin Password: urchin Managing Urchin Services There are two programs that are installed as NT services, the Urchin Task Scheduler and the Urchin Webserver. These services may be manually stopped and started by using the Disable Services and Enable Services shortcuts under Start Menu−>Programs−>Urchin. When these shortcuts are used both services are simultaneously turned off or on. User Access to Reports Users should use URL http://yourhost:portnumber, where yourhost is the name of the system where Urchin is installed and portnumber is the port number for the Urchin webserver (9999, unless you specified a different number during installation). Advanced Reporting Options If you require unique visitor and session tracking, continue with the Visitor Tracking section of the Documentation Library. If you would like to know about processing e−commerce data, please see the E−commerce Module section as well.

Installation Guide (UNIX)

The basic components of the Urchin 5 installation process are: • Creating the distribution directory, unpacking the files, and setting appropriate ownership and file permissions • Configuring and launching an Apache webserver to allow web based configuration and report delivery • Launching the Urchin task scheduler daemon, which manages log processing jobs • Initial configuration and demo licensing of Urchin via the administration interface The installer image you download is in the form of an archive, which will unpack into an install script, some support files, and the Urchin distribution. Urchin can be installed by any legitimate user on your system. It does not expect nor require any special system privileges either to install or operate, and is specifically designed to run as a non−root user for security reasons. Installation Preparation Chapter 1: Getting Started

11

You may install as any user, with the exceptions that you will have to install as the superuser if you install in a directory that has write access restrictions, or if you configure your webserver to respond to requests on a port number that is lower than 1025. Only the superuser can configure the webserver with a port number lower than 1025. Please verify that the port you choose does not conflict with existing operational services on your system. The installation process will attempt to check for conflicts. If you are installing as root, you will also be asked for a user account name and a group name, which are used in the configuration file for the webserver, and also used to set the ownership on the installed Urchin distribution. The user and group names you select must be valid logins recognized by your system; you cannot choose arbitrary names for these. You also are not allowed to use root as the login to own the Urchin files for system security reasons. If you are not logged in as root while installing, you will typically not have the privileges to set the ownership of the files to the user of your choice. The install script will automatically detect this and install the distribution with your login as the owner of the files. Lastly, you will need access to the Internet from your machine, since it is required for you to connect to the urchin.com site to complete the demo licensing and activate your Urchin distribution once it is installed. Installation Instructions The installer archive could be either a .tar.gz or a .sh archive, depending on your OS, and will be labeled with a name that identifies it for your OS type (e.g. urchin5000_freebsd4x.sh, urchin5000_redhat9.tar.gz). Copy the archive to any writeable area on your system and depending on your install image type do one of the following: • For a shell archive (e.g. urchin5000_freebsd4x.sh) simply type the name of the file like so: ./urchin5000_freebsd4x.sh If you get a "Permission Denied" error, then run the command in this fashion: sh ./urchin5000_freebsd4x.sh • For a .tar.gz image (e.g. urchin5000_redhat9.tar.gz), uncompress and unpack the installation files with the commands: gunzip urchin5000_redhat9.tar.gz tar xf urchin5000_redhat9.tar Once the archive has been unpacked you should have the following files− • install.sh (the installation script) • install.txt (instructions similar to this document) • license.txt (legal restrictions, licensing, and purchasing info) • inspector (verifies the installed distribution) • gunzip (supplied to unpack urchin.tar.gz) • urchin.tar.gz (a tarred and compressed Urchin distribution) To install simply type:

Chapter 1: Getting Started

12

./install.sh and follow the instructions. Initial Configuration Using the Administration Interface The installation script will start the Urchin webserver and Task Scheduler daemons. Once they are started you can connect to your Urchin administration interface by using the URL http://yourhost:9999, where yourhost is the DNS hostname for your system. If you have changed the default port number from 9999 to some other port during the installation, then you should use that port number in the URL. You will get a login screen. Use these initial login values: Username: admin Password: urchin Upon initial login, the interface will take you to a License Urchin wizard. You should click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. When finished with the license process, you will be returned to the Urchin administration interface where you will be led through a Setup wizard that will set some required initial configuration parameters. Managing Urchin 5 Services There are 2 daemons, urchind and urchinwebd, that need to be running in order for log processing, reporting, and configuration administration to occur. These daemons are stopped and started by the urchinctl program in the bin subdirectory of your Urchin distribution. To start or stop both daemons, use: ./urchinctl start ./urchinctl stop You can also specify to only start or stop one daemon at a time by using a −w option for the webserver or a −s option for the scheduler. To see all of the available options, execute urchinctl with the −h option. Any errors encountered when one of the daemons is launched should be reported on the command line. For the urchinwebd daemon, once you think it is running successfully, you should also check the var/error_log file for any startup problems. At install time, the install.sh script will create a bootup/shutdown script that you can use in conjunction with your system rc files to cause the Urchin services daemons to be started at boot time and halted at shutdown. The script is named urchin_daemons and is located in the util subdirectory of your Urchin distribution. User Access to Reports Users should use URL http://yourhost:portnumber, where yourhost is the name of the system where Urchin is installed and portnumber is the port number for the Urchin webserver (9999, unless you specified a different number during installation). Advanced Reporting Options Chapter 1: Getting Started

13

If you require unique visitor and session tracking, continue with the Visitor Tracking section of the Documentation Library. If you would like to know about processing e−commerce data, please see the E−commerce Module section as well.

Installation Guide (Mac OS X 10.2.x)

These installation notes pertain to installing Urchin 5 on systems running a minimum of Mac OS X 10.2. For older Mac OS X versions please see the general instructions for UNIX−type installations. The Mac OS X 10.2 installer is a point−and−click package style installer that is downloaded in the form of a disk image. The basic components of the Urchin 5 installation process are: • Download Urchin and unpack the installation archive • Double−click the Urchin.mpkg file, which will launch an interactive installation process The installer will install 3 distinct parts: • Urchin binaries, utilities, and support files, including an Apache webserver for administration and report delivery • Urchin StartupItems • Urchin Preference Pane Installation Preparation The Mac OS X installer requires Mac OS X 10.2 or higher. Users of older Mac OS X systems need to use the Mac OS X 10.1.x shell archive installer. An installing user must be able to authenticate using an account that has administrative privileges on the system since the installer will be installing files in restricted locations. While installing a dialog will inquire about what disk you want to install on. Currently, it is required that you install on the Startup volume. Installation Instructions If you are upgrading an existing installation of Urchin, please consult the Upgrades section of the Documentation Center for relevant details. If the installation disk image is downloaded to your system via a browser, it will automatically unpack and create an Urchin 5 folder on the desktop. If the installer image is downloaded via ftp or other mechanism, once the disk image is double−clicked, it will uncompress and create the desktop folder. Inside the folder the contents will be as follows:

Chapter 1: Getting Started

14

Urchin.mpkg Readme.rtf Install.rtf License.rtf uninstall_urchin.sh Packages folder Double−click the Urchin.mpkg icon and follow the instructions in the dialog boxes to complete your installation. The dialogs will prompt you Initial Configuration Using the Administration Interface The installer will start the Urchin webserver and Task Scheduler daemons and launch a browser to connect you to the Urchin administration interface. You will get a login screen. Use these initial login values: Username: admin Password: urchin Upon initial login, the interface will take you to a License Urchin wizard. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. When finished with the license process, you will be returned to the Urchin 5 administration interface where you will be led through a Setup wizard that will set some required initial configuration parameters. At any time in the future you can connect to your Urchin 5 administration interface by using the URL http://yourhost:9999, where yourhost is the DNS hostname for your system. Managing Urchin 5 Services The Urchin 5 services can be controlled or monitored by launching the System Preferences and clicking the Urchin icon. User Access to Reports Users should use URL http://yourhost:9999, where yourhost is the name of the system where Urchin is installed. Advanced Reporting Options If you require unique visitor and session tracking, continue with the Visitor Tracking section of the Documentation Library. If you would like to know about processing e−commerce data, please see the E−commerce Module section as well.

Installation Guide (Sun Cobalt)

Chapter 1: Getting Started

15

The installer is a .pkg file, installed via the Cobalt Administration interface. The basic tasks of the Urchin 5 installer for Sun Cobalt are: • Create the /home/urchinc distribution directory, unpack the files, and set appropriate ownership and file permissions • Configure and launch a light Apache webserver to allow web based configuration and report delivery • Launch the Urchin task scheduler daemon, which manages log processing jobs • Permit initial configuration and demo licensing of Urchin via the administration interface Installation Preparation You must have root access to install Urchin on a Sun Cobalt system. Although Urchin itself does not require any special system privileges to operate, and is specifically designed to run as a non−root user for security reasons, installation requires superuser access to some areas of your system. You should download the appropriate package file for your system. This can be done one of 2 ways: • Use a web browser from your desktop system to download from http://www.urchin.com/download/urchin5 and save the .pkg file on your local machine until you're ready to install • Use ftp directly from your Cobalt system to ftp.urchin.com/pub/urchin5, and put the downloaded .pkg file into the /home/packages directory Your Cobalt system will need access to the Internet, since it is necessary for you to connect to the urchin.com site to complete the demo licensing and activate your Urchin distribution once it is installed. RaQ550 owners should read and understand the information on RaQ550 web.log permissions issues when installing Urchin. Installation Instructions If you are upgrading an existing installation of Urchin, please consult the Upgrades section of the Documentation Center for relevant details. Begin by connecting with your browser to the main Site Administrator's page for your Cobalt box, and navigate to the section of the Sun Cobalt administration interface used to install new third party software. The location of this area in the Cobalt interface will be platform specific: • Raq 3 or RaQ 4 − click on Maintenance in the left hand frame, then click Install Software in the top row • RaQ 550 or Qube 3 − click on the BlueLinQ tab, then click Third Party Software, then click the Install Manually button In the new software area, prepare and launch the package installer using the directions appropriate for your platform: • RaQ 3 or RaQ 4

Chapter 1: Getting Started

16

♦ If you downloaded the Urchin package by using a browser and saving the .pkg file on your desktop system, then in the Software Package box select the Upload radio button, then click the Browse button to the right and navigate through your local filesystem until you locate the file. Once you click the Open button in the browse window, the pkg filename will be entered into the Upload text box. ♦ If you copied the software into your Cobalt system's /home/packages directory, then select the radio button labeled Loaded. Then choose your package installer from the drop down box to the right of this button. ♦ Click the "Install a pkg Package" button. When the installation is finished, in the lower section labeled Software on the Sun Cobalt Server, an Urchin link will appear. • RaQ 550 or Qube 3 ♦ If you downloaded the Urchin package by using a browser and saving the .pkg file on your desktop system, then in the Location box select the Upload radio button, then click the Browse button to the right and navigate through your local filesystem until you locate the file. Once you click the Open button in the browse window, the pkg filename will be entered into the Upload text box. ♦ If you copied the software into your Cobalt system's /home/packages directory, then in the Location box, select the radio button labeled "Packages in /home/packages", and choose your package installer from the drop down box to the right of this button. ♦ Click the Prepare button. Once the package has been prepared, an Install Software window will appear. In this window click the Install button. When the installation is finished, Urchin will be listed under the Programs tab. Initial Configuration Using the Administration Interface Click on the Urchin link in your Cobalt administration interface and the Urchin admininstration login window will appear. Alternatively, you can connect directly to your Urchin administration interface without going through the Cobalt administration interface by using the URL http://yourhost:9999, where yourhost is the DNS hostname for your Cobalt system. Enter the admin username and password to start configuring Urchin. Use these initial login values: Username: admin Password: urchin Upon initial login, the interface will take you to a License Urchin wizard. You should click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. When finished with the license process, you will be returned to the Urchin administration interface where you will be led through the Setup Wizard, which will set some required initial configuration parameters. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. You will see a list of some sample Cobalt profiles and Log Sources that you can use as templates. Once you have a Profile you're ready to start processing logs and viewing Report data. Managing Urchin Services There are 2 daemons, "urchind" and "urchinwebd", that need to be running in order for log processing, reporting, and configuration administration to occur. These daemons are automatically launched by the Chapter 1: Getting Started

17

installation process and are configured for your system so that they should always restart if the system is rebooted. However, you may have need to control these processes manually. The daemons are stopped and started by the urchinctl program in the bin subdirectory of your Urchin distribution in /home/urchin. To start or stop both daemons, use: /home/urchinc/bin/urchinctl start /home/urchinc/bin/urchinctl stop You can also specify to only start or stop one daemon at a time by using a −w option for the webserver or a −s option for the scheduler. To see all of the available options, execute urchinctl with the −h option. Any errors encountered when one of the daemons is launched should be reported on the command line. For the urchinwebd daemon, once you think it is running successfully, you should also check the var/error_log file for any startup problems. User Access to Reports Users should use URL http://yourhost:portnumber, where yourhost is the name of the system where Urchin is installed and portnumber is the port number for the Urchin webserver (9999, unless you specified a different number during installation). Urchin Traffic Monitor On Sun Cobalt systems, due to the combination of the default webserver logging format, the automated webserver log splitting mechanism, and the built−in statistics gathering software, it is currently not possible to utilize the Urchin UTM.

Uninstalling Urchin 5

Windows Uninstalling on a Windows system can be done in two ways. • Using Add/Remove Programs control panel − go to Start−>Settings− >Control Panel and doubleclick on Add/Remove Programs. Highlight Urchin and click the Change/Remove button. An InstallShield window should launch and present you with a dialog box with 3 radio button choices: Modify, Repair, and Remove. Select the Remove button and click Next, then follow the remaining dialog boxes to complete. • Re−running an Urchin installer − Running the setup.exe you installed Urchin with should detect that Urchin is already installed and present you with the dialog box with the Modify, Repair, and Remove radio buttons. When the uninstall process is completed there will be an Urchin data folder left in its original installation location. This folder contains Urchin report and configuration data, and is not removed during uninstallation. If you are completely removing Urchin from your system, you may remove the Urchin folder to reclaim disk space. Chapter 1: Getting Started

18

UNIX−type Systems Using the urchinctl program, stop the Urchin webserver and Urchin task scheduler services like so: /path/to/urchin/bin/urchinctl stop

Once this is done you can remove the entire Urchin installation directory. If you have installed the urchin_daemons boot script that causes the Urchin services to start/stop when the system is rebooted, you should remove this script from the startup initialization area of your system. Mac OS X 10.2.x and later In the Urchin installer disk image (e.g. the urchin5.XXXX_macosx102.dmg), there is a script that can be used to automate removing Urchin. • Mount the Urchin installer disk image by double clicking on it. This will mount a new volume on your desktop named Urchin 5.XXX (where 5.XXX) is the Urchin version number, e.g. 5.702) • Open up a Terminal window by launching the Finder and selecting Applications from the Go menu. Navigate into the Utilities folder and double click on Terminal. • In your terminal window run the command: sudo /Volumes/Urchin 5.XXX/uninstall_urchin.sh

where 5.XXX is the Urchin version number. The uninstall_urchin.sh script will remove all of the Urchin binaries and support files, but leave your configuration and report data intact. If you want to remove all data as well, then you should manually delete the /usr/local/urchin directory with the command: sudo rm −r /usr/local/urchin

Sun Cobalt Connect to the main Site Administration page for your Cobalt system and follow the directions below for your system type: • RaQ3 or RaQ4 − Click on Maintenance in the left hand frame, then click Install Software in the top row. In the list of installed software on the system, click the Urchin link. In the Urchin management screen, click Uninstall Urchin, then click to confirm that you want to uninstall. • Qube3, XTR, or RaQ 550 − Select the BlueLinQ tab, then click Installed Software in the left hand frame. In the software list you will see an entry for Urchin. The right hand column of the Urchin entry has an uninstall icon. Click the icon and then click OK to confirm that you want to uninstall. When the uninstall has completed the Cobalt Administration interface should refresh and any entry for Urchin should be gone. Once this is done you can remove the /home/urchin directory. Please note that removing /home/urchin will irretrievably delete any remaining configuration and report data.

Chapter 1: Getting Started

19

Troubleshooting Install Problems

All Platforms Please pay close attention to the output from your installer. Read all dialog boxes, requests for user input, and output text carefully. If your installer fails please record the complete and exact error messages that are generated including any error codes. This info is required for full analysis of your problem. Windows To create debugging output of what's happening during a Windows installation, you can have the setup.exe program log its activities to a file. This is particularly useful when you have some unknown error during installation. To trigger logging, you'll have to launch setup.exe by using the Run Command mechanism. Go to the Start menu and select Run... and in the Open: entry box, enter the full path to the setup.exe along with the appropriate logging options like so: C:\temp\setup.exe /v"/Lv C:\temp\installer.log" Pay close attention to the syntax of this command. The spaces, quotes, slashes, and backslashes should all be entered in exactly as shown. C:\temp\setup.exe should be replaced by the real path to setup.exe on your system. The file C:\temp\installer.log is where the execution logging output will be stored. UNIX−type Systems If you have problems executing the shell archive installer, you may download a compressed tar archive of the installation kit which you simply gunzip and untar. Then you can run the install.sh installation script and complete the install as described in the installation guide notes for Unix.

Upgrades Upgrading Urchin 4

Overview Upgrading from Urchin 4 to Urchin 5 requires using the procedure that is specific both to your operating system and the Urchin version you're running. This document contains upgrade sections that cover all supported platforms and Urchin versions. Please make sure to verify that you are following the appropriate instructions for your situation. Before performing any upgrades please make sure you do the following: Chapter 1: Getting Started

20

1. Shutdown the Urchin services and back up your entire existing Urchin installation. Having the services disabled will guarantee that there is no database activity while the backup is in progress. 2. Have a record of the existing installation location and port number of the webserver. Considerations When Upgrading • Licensing: Urchin 4 licenses are not compatible with Urchin 5. You will have to upgrade your Urchin 4 licenses. Speak with your sales rep for assistance with this. • Differences in report numbers: After upgrading to Urchin 5, you may notice some changes in the session counts in your reports compared to using Urchin 4. Please see the article entitled Urchin 3, 4, &5 Reporting Differences in this section for details on these differences • Visitor tracking with UTM−1: if you are using UTM−1 with Urchin 4 to track unique visitors to your site, no website changes are required. Urchin 5 will process and report on UTM−1 data. Once you have upgraded to Urchin 5, it is advisable, although not required, to edit the Profiles for any UTM enabled sites. Go to the Profile Settings tab and select UTM− Enabled All for the Default Report Set. • It is strongly advised that you upgrade to at least Urchin 5.6 and update your website to use UTM−4. UTM−4 improves visitor tracking metrics and options for campaign tracking. If you do not need the campaign tracking capability UTM−4 provides you can reduce log space overhead by editing the __utm.js file and setting __utmctm=0. Procedures Windows with Urchin 4.10x 1. Doubleclick on the urchin5xxx_win_setup.exe file and follow the instructions in the Welcome and License Agreement dialog screens. 2. In the dialog screen labeled Preparing to Upgrade Urchin Installation, the installer will present you with a list showing you the directory location and webserver port number it has determined for your existing installation. It will use these parameters for your upgrade. 3. If you decide you don't want to use these installer settings, you may exit the installation by clicking the Cancel button. It is not an option to alter the settings of your current installation during your software upgrade, since the upgrade has to match the previous configuration information stored in your Urchin 4 databases. 4. Click the Next button and the installer will proceed with converting your installation to the new version. Your report and configuration data will automatically be preserved during this process. Windows with Urchin 4.00x To migrate properly from Urchin 4.00x installations to Urchin 5, you will have to save some of your existing configuration and data files. 1. Disable your currently running Urchin Services by going to Start−>Programs−>Urchin−>Disable Services 2. Navigate to your Urchin installation folder (e.g. C:\Program Files\Urchin) and rename the data folder to data−saved 3. Copy etc\httpd.conf to var\urchinwebd.conf 4. Launch the urchin5xxx_win_setup.exe installer. The installer should detect your previous installation and determine the configuration parameters. Just follow the instructions in the dialog windows of the Chapter 1: Getting Started

21

installer. 5. After the installer has finished go to Start−>Programs−>Urchin−>Disable Services to deactivate your currently running Urchin 5 services. 6. In the Urchin installation folder (e.g. C:\Program Files\Urchin), rename the new data folder to data−notused, and rename the data−saved folder from step 2 back to data 7. Launch the Urchin services by going to Start−>Programs−>Urchin− >Enable Services UNIX with all versions of Urchin 4 The Urchin 5 install.sh script will properly handle all existing installations of Urchin 4 on UNIX−type systems. When running install.sh, be sure to select Upgrade as the installation type when prompted. Please see the Installation Guide (UNIX) section of the Documentation Center for full instructions. Mac OS X 10.2.x Note: Mac OS X 10.1 users should use the section on upgrades for UNIX−type systems. For the majority of cases the Urchin 5 package installer for Mac OS X 10.2 systems will automatically detect existing Urchin 4 installations and upgrade them. You simply use the instructions in the Installation Guide for Mac OS X 10.2. The one exception to this standard package installer upgrade procedure is if you previously installed using the shell archive installer but did not install in the default location of /usr/local/urchin4, and are now using the package installer to upgrade. In this case you should take the following steps: • Turn off your existing Urchin services using the urchinctl program (i.e. ./urchinctl stop) • Install normally using the package installer, but do not do any initial configuration using the admin interface. When the browser launches at the end of the install, simply close the window. • After the package installer has completed, go to the Urchin Preference Pane in the System Preferences and stop all running Urchin services • In the new Urchin 5 installation directory, /usr/local/urchin, rename the data subdirectory to data−saved • Move the entire data subdirectory from your old Urchin version install directory into /usr/local/urchin • In /usr/local/urchin update the ownership of the data subdirectory using the command: chown −R www:www data • Launch your new Urchin 5 services by using the Urchin preference pane in System Preferences to start the scheduler and webserver Your old configuration and report data should now be available in your new Urchin installation. Once you have confirmed that the configuration and processing is normal you can remove your old Urchin 4 distribution, as well as the /usr/local/urchin/data−saved directory. Sun Cobalt Login to your Sun Cobalt administration interface and navigate to the section where Urchin is listed. Uninstall Urchin 4 by clicking the Uninstall Urchin 4 link. Once the system reports the uninstall process is complete, you can install the Urchin 5 pkg installer normally per instructions in the installation section of this guide. The Urchin 5 installation will detect the existing Urchin 4 data and move it into place as part of the Urchin 5 installation. Chapter 1: Getting Started

22

Upgrading Urchin 3

Overview Urchin 5 is an entirely new product with thoroughly revised internal workings and data formats that are not compatible with Urchin 3. Therefore an existing Urchin 3 installation cannot be upgraded by simply installing Urchin 5 in its place. However, it is possible to install Urchin 5 side by side with Urchin 3 so that you may migrate report and configuration data from one to the other. The basic Urchin 3 to Urchin 5 upgrade process consists of: • Installing and licensing Urchin 5 • Deactivating Urchin 3 log processing • Running a migration tool to import Urchin 3 data into Urchin 5 • Post migration configuration of Urchin 5 processing There are some special circumstances to consider for Urchin 3 to Urchin 5 migrations: • Not all Urchin 3 configurations can be migrated. In particular existing configurations that rely on the Urchin 3 SubreportMode directive cannot be imported directly into Urchin 5, which does not support SubreportMode. • u3importer cannot be used to migrate Urchin 3 data between differing platform types as part of the import process. So you cannot, for example, take Urchin 3 databases created on a Windows platform and try to import them into an Urchin 5 installation on a Sun. u3importer must be run on a platform of the type where the Urchin 3 databases were created. • If you are upgrading a Sun Cobalt server, you should use the instructions in the special dedicated section of the Upgrades documentation on Upgrading Urchin 3 on Sun Cobalt. Procedure You should already have downloaded the Urchin 5 installer appropriate for your system. Also you will need to know the full path to your Urchin 3 config file to complete the upgrade process. Proceed as follows: • Install Urchin 5 as appropriate for your platform per instructions in the Installation section of the Documentation Center • Obtain a license for Urchin 5 and perform basic configuration of global settings such as assigning an admin password and so forth, but do not create any Profiles. • Run the inspector program in the Urchin 5 util subdirectory to verify that your installation is correct. If any errors are reported correct them before proceeding with your Urchin 3 migration. • If necessary to guarantee that no changes are made to your Urchin 3 databases during migration, deactivate your Urchin 3 log processing as follows: ♦ Windows − launch the Urchin 3 configuration interface and set reports to Off as appropriate ♦ UNIX−type systems − edit your crontab and comment out the line that controls Urchin processing

Chapter 1: Getting Started

23

• Run the u3importer program located in the Urchin 5 util subdirectory. This program will prompt you for the full path to your Urchin 3 config file, then prompt to indicate which sites you want to import into Urchin 5. Once u3importer has finished, your Urchin 3 report and configuration data should be established in Urchin 5. Connect to the Urchin 5 administration interface and verify that you have correct Profiles for all your websites. Ecommerce Processing Urchin 5 has the ability to process ELF logs and correlate the data with access logs. The ELF log source can be added to the regular log source for a profile if the Urchin 5 Ecommerce module is installed.

Upgrading Urchin 3 on Sun Cobalt

Overview Please read and understand all these instructions first before proceeding with your migration from Urchin 3 to Urchin 5 on Sun Cobalt systems. The recommended way to upgrade on Sun Cobalt is to start with a fresh installation of Urchin 5 which has not gone through any configuration other than the having the initial Setup Wizard run. You should not manually configure any Profiles, Log Sources, Users, etc. after installing. The migration utilities will create these as needed while importing your Urchin 3 data. The basic steps in upgrading a Sun Cobalt system are: 1. Backup your entire Urchin 3 distribution 2. Deactivate Urchin 3 log processing 3. Install Urchin 5, connect to the administration interface and run the Setup Wizard 4. Run u3importer to import Urchin 3 report configurations as Urchin 5 profiles and convert Urchin 3 data to Urchin 5 format 5. Download and run the u5_cobalt_import.pl script to import other Urchin 3 for Cobalt config settings, such as Customers and Users, into your Urchin 5 configuration 6. Schedule tasks to process logs for each newly created Profile Procedure You will have to telnet or ssh into your Cobalt system as root to perform some of these instructions. You should keep a terminal window open so that you can move back and forth from the command line to the graphical interfaces as necessary. 1. Backup Urchin 3 distribution − this is suggested strictly as a standard precaution. The process of importing your Urchin 3 data into Urchin 5 does not alter your Urchin 3 installation.

Chapter 1: Getting Started

24

2. Deactivate Urchin 3 log processing − on Cobalt systems this requires that you move 2 scripts that manage the daily execution of Urchin. On the command line in your terminal window execute these commands: mv /etc/cron.daily/urchin /home/urchin3da/admin/bin mv /etc/cron.daily/urchin_purge_weblogs /home/urchin3da/admin/bin 3. Install Urchin 5 following the instructions for Sun Cobalt in the Installation section of this guide, and run the Setup Wizard to do the initial configuration. Important: in the Admin Settings screen of the Setup Wizard, you must set Data Center Mode to On. 4. Run u3importer − this program, located in the util subdirectory of your Urchin 5 installation, will prompt you to import report directives from your Urchin 3 config file. When prompted for the location of your config file path, enter /home/urchin3da/config Subsequently you will see additional prompts that say Import Urchin 3 Configurations and then Import Urchin 3 Data Just hit the return key to accept the default response of all for these last two steps. When u3importer has finished you should verify that correct Profiles were created by examining the configuration via the Urchin administration interface. 5. Download and run u5_cobalt_import.pl − since u3importer only deals with importing Urchin 3 databases and creating Urchin 5 Profiles for your existing Urchin 3 reports, other configuration info that is specific to Cobalt installations has to be imported separately using this tool. You can download u5_cobalt_import.pl from ftp://ftp.urchin.com/urchin5/support. Put this script in the /home/urchin/util directory on your Cobalt system. Then in your terminal window execute the program like so: ./u5_cobalt_import.pl The script will prompt you for input as needed. When the script has finished, the configuration import portion of the migration process will be complete. 6. Schedule tasks to process logs for the Urchin 5 profiles − when you import Urchin 3 data using u3importer, a task is created in the Scheduler for each Profile, but the scheduled time to run is not set. You can either set each schedule manually via the Urchin 5 administration interface, or use the uconf−schedule utility to set all tasks simultaneously.

Chapter 1: Getting Started

25

Urchin 3, 4, &5 Reporting Differences

Differences to Note When Migrating Between Urchin Products This document is an overview describing basic differences in data analysis as well as certain migration issues when moving from one major Urchin version to another. Each major Urchin version is listed along with a summary explaining key elements of how data is analyzed. The latter portion of this page covers issues to anticipate for particular migration scenarios. Urchin 3 • Visitor tracking is done by incoming IP address only. There is no distinction between a visitor and a session. • All MIME types except images (gif/jpg/png) are treated as pageviews. • Pageview hits with a HEAD request type are logged as treated as actual pageviews. • Pageviews are not required to count a visitor, so a request for a single image file could be counted as a new visitor. • Hits with error codes of 404 or 5xx are considered legitimate visits and could increment the visitor count. • Traffic−>Hourly report and Tracking reports (e.g. Top Entrances, Top Exits) data is stored on a monthly basis, therefore the only report granularity is for a single month date range. Urchin 4 • The default visitor tracking method uses a combination of IP address plus the User−Agent field from log entries. Other tracking options include UTM−1, session id, and IP−only (i.e. Urchin 3 style tracking). • Urchin 4 provides UTM−1 to enable optimal visitor and session tracking. UTM−1 utilizes client side cookies to identify unique individuals as opposed to relying on IP addresses, which are not necessarily unique to a particular person or system. • By default a session requires a legitimate pageview to be counted. A request for an image is not considered a pageview nor is a request with a status code other than 2xx, 302, or 304. This will typically reduce counts for visitors, sessions, pageviews and related reports in Urchin 4 using IP−Only tracking when compared to Urchin 3 reports for the same data. The pageview requirement is configurable by the Urchin administrator for sites that have a design that makes counting of images as pageviews desirable. • When using UTM−1 tracking, sessions without UTM cookie info will be processed using the default of IP+User−Agent. • Traffic−>Hourly report and Tracking reports (e.g. Top Entrances, Top Exits) data is stored on a daily basis, therefore the report granularity is for any time period of a day or greater. Urchin 5 • The default visitor tracking method uses a combination of IP address plus the User−Agent field from log entries. Other tracking options include UTM (either UTM−2 or UTM−1), session id, username, and IP−only (i.e. Urchin 3 style tracking). Chapter 1: Getting Started

26

• Urchin 5 provides improved visitor and session tracking based on UTM−2, which uses client side cookies with a configurable session timeout. With this technology, hits with the same cookies spread out over a large period of time can be counted as multiple sessions as opposed to a single long session. This produces more meaningful averages in the reports. • For both UTM−1 and UTM−2 tracking, the processing logic has changed so that only hits with UTM cookie information are processed when counting visitors, sessions, and pageviews. Hits without UTM info do not fall back to processing using IP+UserAgent as in Urchin 4. Such non−cookie sessions are tracked only for reports that are based on hits and bytes. This can lower counts for visitors, sessions, pageviews, and related reports when compared to Urchin 4 because it significantly reduces the effect of robot traffic on your statistics. • An explicit include or exclude MIME type list is now used to define what a pageview is. By default, Urchin 5 excludes the following MIME types from the pageview list: gif,jpg,jpeg,png,js,css,cur,ico,ida All other MIME types are considered to be pageviews or downloads. • Pageview hits which use HEAD as the request type only cause the Hits count for that page to be incremented, the pageview count is not. • By default a session requires a legitimate pageview to be counted. A request for an image is not considered a pageview nor is a request with a status code other than 2xx, 302, or 304. This will typically reduce counts for visitors, sessions, pageviews and related reports in Urchin 5 using IP−Only tracking when compared with older Urchin version reports for the same data. The pageview requirement is configurable by the Urchin administrator for sites that have a design that makes counting of images as pageviews desirable. • With the exception of the Status and Errors report, all reports that graph vs. hits are based on valid hits. Previously, such graphs were based on all hits (i.e. valid and hits with errors). • Traffic−>Hourly report and Tracking reports (e.g. Top Entrances, Top Exits) data is stored on a daily basis, therefore the report report granularity is for any time period of a day or greater. Migrating from Urchin 3 to Urchin 5 Reporting Since in Urchin 3 the Tracking reports and Traffic−>Hourly Graph data is only stored on a monthly basis, and in Urchin 4 and Urchin 5 this data is stored on a daily basis, a side by side comparison of these reports requires that you set the date range to one month in the newer products. Also, when importing Urchin 3 data there is no way to break out the monthly data for these reports into individual days, so all data for these specific reports for a given month will be placed into the first day of the month in the newer Urchin version. Here too, setting the date range to one month will allow the imported historical data to be viewed in the correct context. Administration Administration is primarily via graphical interface and is based on a binary configuration database. However, command line tools and the ability to import a flat file configuration are available for those who are used to and prefer the config file approach of Urchin 3. Migrating from Urchin 4 to Urchin 5 Reporting Chapter 1: Getting Started

27

Urchin 4 databases are fully compatible with Urchin 5. Report data will be immediately available once you upgrade. As noted above in the product descriptions, Urchin 5 uses a different logic for processing hits, so once you upgrade you will initially see a difference in report numbers compared to recent historical data generated with Urchin 4. These variances will differ depending on which visitor tracking method you've been using. Log Tracking Logtracking data in Urchin 4 is kept in a single tracking file. In Urchin 5 this data is kept in individual monthly databases. When an Urchin 4 installation is upgraded to Urchin 5, the old logtracking data is converted into equivalent Urchin 5 monthly logtracking databases, and the Urchin 4 logtrack file is archived. Migrating from UTM−1 to UTM−2 Reporting IMPORTANT: UTM−2 cannot be used with Urchin 4. You must be running Urchin 5 before switching your website to use UTM−2. The improved accuracy in identifying unique visitors that UTM−2 provides means that you may see some differences in reported numbers compared to what you have been seeing using UTM−1. These differences should be on the order of 10% or less.

Upgrading Urchin 5

Overview Upgrading Urchin 5 is a straightforward process. The installers typically deal automatically with upgrading existing installations while leaving your configuration and report data intact. This document contains upgrade sections that cover all supported platforms. Please make sure to verify that you are following the appropriate instructions for your situation. Before performing any upgrades please make sure you do the following: 1. Back up your entire existing Urchin installation, in particular any customized configuration files. 2. Shutdown the Urchin services. Having the services disabled will guarantee that there is no database activity while the backup is in progress. 3. Have a record of the existing installation location and port number of the webserver. Considerations When Upgrading • It is always advisable to install on your website the latest __utm.js provided with the current release when upgrading Urchin. In addition, as of Urchin 5.7 there is a new UTM, and all users of Urchin 5.x products are encouraged to upgrade to this UTM version even if you do not upgrade to 5.7 at this time. Chapter 1: Getting Started

28

• Campaign Tracking Module users who download Google CPC data must modify their Google download process when upgrading to Urchin 5.6 or 5.7. Please see the help article on importing Google cost data in the Campaign Tracking Module section. • Visitor tracking with UTM−1, UTM−2, or UTM−3: Urchin 5.6 and newer versions are backwards compatible when processing all older versions of UTM data. Although not required, it is strongly advised that you upgrade your website to UTM−4 or later regardless of the Urchin 5 version you are using. • Optimizing UTM−4 settings: UTM−4 improves visitor tracking metrics and options for campaign tracking. If you do not need the campaign tracking capability, you can reduce log space overhead by editing the __utm.js file and setting __utmctm=0. This will still allow you to benefit from the improved UTM−4 visitor tracking. Procedures Windows 1. Doubleclick on the urchin5xxx_win_setup.exe file and follow the instructions in the Welcome and License Agreement dialog screens. 2. In the dialog screen labeled Preparing to Upgrade Urchin Installation, the installer will present you with a list showing you the directory location and webserver port number it has determined for your existing installation. It will use these parameters for your upgrade. It is not an option to alter the settings of your current installation during your software upgrade, since the upgrade has to match the previous configuration information stored in your Urchin 5 databases. 3. Click the Next button and the installer will proceed with converting your installation to the new version. Your report and configuration data will automatically be preserved during this process. UNIX The install.sh installation script which is bundled as part of all UNIX−type installers will properly upgrade any older version of Urchin 5 installed on UNIX−type systems. When running install.sh, be sure to select Upgrade as the installation type when prompted. Otherwise the upgrade procedure for UNIX is identical to a new installation. When using install.sh interactively to do an upgrade, at one point you will be presented with the prompt: Please select the installation type [Default: 1] 1. New 2. Upgrade −>

Be sure to select 2 to trigger an Upgrade. If you are using install.sh in non− interactive mode by specifying command line options then be sure to use the −m option to specify an upgrade. Please see the Installation Guide (UNIX) section of the Documentation Center for full instructions on using install.sh. Mac OS X 10.2.x Note: Mac OS X 10.1 users should use the section on upgrades for UNIX−type systems.

Chapter 1: Getting Started

29

If you have previously used an Urchin 5 package installer for Mac OS X 10.2, then using a newer package installer will automatically detect your existing Urchin 5 installation and upgrade it. Users in this situation should simply use the instructions in the Installation Guide for Mac OS X 10.2 and skip the rest of the instructions in this subsection as in this case the instructions for new installation and upgrade are the same. If you did not previously use the package installer, but installed using the install.sh installation script and did not install in the default location of /usr/local/urchin, then you must use a modified procedure to upgrade. The package installer will only install in /usr/local/urchin, so it cannot be used to automatically upgrade another install location. If you have this situation, you have two choices: 1. Do not use the package installer to upgrade. Instead download and use the same type of installer you used previously. This means you can follow the standard upgrade instructions for UNIX−type systems detailed in the previous subsection. 2. If you prefer to start using the package installer to upgrade, you can take the steps listed below, but realize that this procedure will cause your Urchin installation to be relocated to the default of /usr/local/urchin: ♦ Turn off your existing Urchin services using the urchinctl program (i.e. ./urchinctl stop) ♦ Move the current Urchin installation directory to /usr/local/urchin. You will need to move the entire directory structure starting with the top level directory of your current Urchin installation. For example if you previously had installed Urchin in /applications/urchin, then you would use the following command: mv /applications/urchin /usr/local/urchin You should verify that you have enough disk space in the /usr/local file system for your current Urchin installation before doing the move. ♦ Once you've relocated Urchin to the proper location you may launch the Urchin 5 pkg installer and follow the interactive instructions to upgrade Your old configuration and report data should now be available in your updated Urchin installation. Sun Cobalt The Urchin 5 pkg installers for Sun Cobalt systems automatically detect existing installations and upgrade the Urchin 5 files as needed. Simply follow the instructions for a new Sun Cobalt installation to perform an upgrade.

Initial Configuration E−commerce Reporting

Urchin is capable of extensive e−commerce reporting in conjunction with its standard web traffic reports. To Chapter 1: Getting Started

30

accomplish this, two basic elements are required: • Shopping cart software that produces activity logs in the ELF/ELF2 format (many can be configured to do so). • The Urchin E−commerce Module, which is available as an add−on to any Urchin 5.x license. To set up a Profile for ELF/ELF2 processing, use the Profile Setup Wizard in the Urchin admin interface and choose Profile type E−commerce. In the Log Source Wizard (which you will be taken through in the Profile setup process), you will need to specify two Log Sources − the standard website access log, and the ELF log. ELF: To process existing ELF logs with Urchin requires only that you set LogFormat in the Log Source to ELF (or auto), and that the Visitor Tracking method in the Profile for the site be set to IP−ONLY. ELF2: To use ELF2 you must configure your shopping cart software to generate log entries formatted as shown below. The ELF2 log format is based on the ELF log format and specification. Some additional fields were added to improve visitor tracking. Any fields containing internal tab characters must be quoted. The transaction line starts with an exclamation character '!' and contains the following fields separated by tabs: !orderid remote host IP (as given by %h in NCSA extended/combined log format) time (as given by %t in NCSA extended/combined log format) store sessionid total tax shipping billcity billstate billzip billcountry cs_useragent cs_cookie

The item line does not start with an exclamation character and contains the following fields separated by tabs: orderid remote host IP (as given by %h in NCSA extended/combined log format) time (as given by %t in NCSA extended/combined log format) productcode productname variation price quantity upsold cs_useragent cs_cookie

Chapter 1: Getting Started

31

Setup Recommendations

Overview Once Urchin is installed, there are some initial operational parameters that will have to be configured. This is done via a Setup Wizard that runs when you connect to your Urchin administration interface for the first time, and during the first stages while you are establishing Profiles. These initial configuration actions include: • Licensing Urchin • Configure Admin Settings for remote report and administration access, as well as establishing Data Center Mode operation • Setting the Urchin Administrator account password • Scheduling tasks for each of your profiles to process data • Log management Procedure Connect to the Urchin administration interface. On Windows systems you can go to Start−>Programs−>Urchin−>Urchin Administration. On UNIX− type systems, Sun Cobalt, and Mac OS X you can use the URL http://hostname:9999, where hostname is the registered hostname of your Urchin system. For a new installation you should use the following to login: Username: admin Password: urchin

You'll be presented with an Urchin Setup Wizard welcome screen. Click Continue to proceed through each of the following wizard screens. Note that the choices you make in this initial configuration can always be altered later on. License Urchin You have to choose one of the links under the Action Items area of this screen to license Urchin before you can proceed with configuring and using the software. Click Buy License to purchase and install a license via the web right away. If you purchased a license via a sales rep prior to installing Urchin, then click Activate Pre− Purchased License. Otherwise, click Obtain Demo License to install an expiring license. Admin Settings • Remote Access Settings − select On for each case if you want to allow remote browser connections. If you select Off for either of these, then the only allowed access is on the console of the system where Urchin is installed. • Data Center Mode − this setting determines whether Urchin is configured to allow creation of Affiliations, which allow you to logically organize Profiles, Groups, and Users into restricted access categories. If you are undecided it is best to set this to On as it adds no overhead and gives you the flexibility to use it in the future.

Chapter 1: Getting Started

32

Administrative User Reset the password for the admin account and record this password for safekeeping. Scheduling Tasks When you create profiles you are given the option to schedule what time to run the task that processes the data for that profile. You should check the settings for each profile to be sure that the timing of you task makes sense in terms of when the log data will be available, how long it will take to process that data, and when you want the updated reports available to your users. Log Management Urchin includes a log tracking module which keeps track of how far into each log it has processed so far. Thus, log file rotation does not necessarily need to be coordinated with Urchin operation. However, Urchin does provide automation for log rotation or removal under the Advanced Settings for each Log Source.

Chapter 1: Getting Started

33

Chapter 2: Visitor Tracking

Using UTM with E−commerce

Overview Since the key aspect of UTM is the ability to identify and correlate visitor activity, when utilized on an e−commerce site in conjunction with the E−Commerce Module, visitor activity that generates revenue can be tracked across your sites and reported on collectively. Transactions on the server that hosts your shopping cart software can be correlated with sessions on your other webserver, allowing session variables, such as referrals and keywords, to be reported on versus the revenue they generate. When using the Campaign Tracking Module, the UTM provides multi−session tracking that tracks the visitor from source to purchase or goal. Conversion ratios and ROI reports in the Campaign Tracking Module provide detailed results of on−line marketing efforts including keyword buying, e−mail campaigns, and link exchanges. Same Domain Configuration If the front−end website and secure e−commerce site use the same domain, installing the UTM on your e−commerce site is no different than installing on other types of websites. The information in the other areas of this section on UTM installation will provide the specifics of installation. Special attention should be paid to the areas explaining how to set the UTM domain appropriately for your e−commerce and other sites. Further information on e−commerce transaction log formats is provided in the E−commerce section of the Chapter 2: Visitor Tracking

34

documentation. Cross Domain Configuration It is increasingly common for web sites with e−commerce shopping carts to outsource the e−commerce component to another organization such as Amazon, PayPal, or Yahoo Stores. This can create a problem for the UTM tracking as the domain for UTM changes as a visitor goes from the main website to the secure store. In order for the secure store to use the same UTM visitor ID as the main website, the visitor ID must be passed in the link to the secure store. The UTM contains a __utmLinker function that will wrap the link with the appropriate id before sending the link to the store. Instead of linking directly to the store, simply pass the link to the __utmLinker function. Here are the specific instructions for using the __utmLinker: 1. Edit the __utm.js file in the document root of both web sites and set the __utmdn variable to "none". 2. Set the UTM Domain for the profile to nothing (blank). 3. Change the links from the main site to the secure site in the form: link− to−shopping−cart

Visitor Identification Methods

Overview Urchin has five different methods for identifying visitors and sessions, depending on available information. Of these, the patent−pending Urchin Traffic Monitor (UTM) is a highly accurate system that was specifically designed to identify unique visitors, sessions, exact paths, and return frequency behavior. There are a number of visitor loyalty and client reports that are only available when using the UTM System. The UTM System is easy to install and is highly recommended for all businesses. In addition to the UTM System, Urchin can use IP addresses, User−Agents, Usernames, and Session−IDs to identify sessions. The following table compares the abilities of each of the five identification techniques: Ability IP Only IP+User−Agent Username Session ID UTM Identifies non−proxied sessions X X X X X Identifies some proxied sessions X X X X Uniquely identifies each session X X Defeats session IP proxying X X Defeats most provider caching X X Defeats browser caching X Chapter 2: Visitor Tracking

35

Uniquely identifies visitors Captures exact path sequence Captures visitor loyalty metrics Captures browser capabilities

X

X X X X

Data Model The underlying model within Urchin for handling unique visitors is based on a hierarchical notion of a unique set of visitors interacting with the website through one or more sessions. Each session can contain one or more hits and pageviews. Pageviews are kept in order so that a path through the website for each session is understood. As shown in the diagram, the Visitor represents an individual’s interaction with the website over time. Each unique visitor will have one or more sessions, and within each session is zero or more pageviews that comprise the path the visitor took for that session.

Proxying and Caching In attempting to identify and track unique visitors and sessions, we are basically going against the nature of the web, which is anonymous interaction. Particularly troublesome to tracking visitors are the increasingly common proxying and caching techniques used by service providers and the browsers, themselves. Proxying hides the actual IP address of the visitor and can use one IP address to represent more than one web user. A user’s IP address can change between sessions and in some cases multiple IP addresses will be used to represent a cluster of users. Thus, it is possible that one visitor will have different IP addresses for each hit and/or different IP addresses between sessions. Caching of pages can occur at several locations. Large providers look to decrease the load on their network by caching or remembering commonly viewed pages and images. For example, if thousands of users from a particular provider are viewing the CNN website, the provider may benefit from caching the static pages and images of the website and delivering those pieces to the users from within the provider’s network. This has the effect of pages being delivered without the knowledge of the actual website. Browser caching adds to the question. Most browsers are configured to only check content once per session. If a visitor lands on the home page of a particular website, clicks to a subpage, and then uses the back−button to go back to the home page, the second request of the home page is most likely never sent to the website server, but pulled from the browser’s memory. An analysis of paths may result in an incomplete path missing the Chapter 2: Visitor Tracking

36

cached pages.

In the above diagram, the actual path taken through the website by the client is shown at the top, while the apparent path from the server’s point of view is shown at the bottom. In this case, before proceeding to Page−3 the user goes back to the Page−1. The server never sees this request and from its point of view it appears the user went directly from Page−2 to Page−3. There may not even be a link from Page−2 to Page−3. Visitor Identification Methods As mentioned previously, Urchin has five different methods for identifying visitors, sessions and paths. The more sophisticated methods which can address the above issues may require special configuration of your website. The following descriptions describe the workings of each method in more detail. 1. IP−Only: The IP−Only method is provided for backward compatibility with Urchin 3, and for basic IT reporting where uniquely identifying sessions is not needed. This method uses only the IP Address to identify visitor sessions. Thirty minutes of inactivity will constitute a new session. The only data requirements for using this method is a timestamp and IP Address of the visitor. 2. IP−Agent: The default method, which requires no additional configuration, uses the IP address and user−agent (browser) information to determine unique sessions. A configurable thirty−minute timeout is used to identify the beginning of a new session for a visitor. While this method is still susceptible to proxying and caching, the addition of the user−agent information can help detect multiple users from one IP address. In addition, this method includes a special AOL filter, which attempts to reduce the impact of their round−robin proxying techniques. This method does not require any additional configuration. 3. Usernames: This method is provided for secure sites that require logins such as Intranets and Extranets. Websites that are only partially protected should not use this method. The Username identification is taken directly from the username field in the log file. This information is generally logged if the website is configured to require authentication. This method uses a thirty−minute period of inactivity to separate sessions from the same username. 4. Session ID: The fourth visitor identification method available in Urchin is the Session ID method, which can use pre− existing unique session identifiers to uniquely identify each session. Many content delivery applications and web servers will provide session ids to manage user interaction with the webserver. These session ids are typically located in the URI query or stored in a Cookie. As long as this information is Chapter 2: Visitor Tracking

37

available in the log data, Urchin can be configured to take advantage of these identifiers. Using session ids provides a much more accurate measurement of unique sessions, but still does not identify returning unique visitors. This method is also susceptible to some forms of caching including the above example. In many cases, the ability to use session ids may already be available, and thus, the time required to configure this feature may be short. For dynamically generated sites, taking advantage of this feature should be straightforward. The result is more accurate visitor session and path analysis. 5. Urchin Traffic Monitor (UTM): The last method for visitor identification available in Urchin is the Urchin Tracking Module. This system was specifically designed to negate the effects of caching and proxying and allow the server to see every unique click from every visitor without significantly increasing the load on the server. The UTM system tracks return visitor behavior, loyalty and frequency of use. The client−side data collection also provides information on browser capabilities. The UTM is installed by including a small amount of JavaScript code in each of your webpages. This can be done manually or automatically via server side includes and other template systems. Complete details on installing UTM are covered in the articles later in this section. Once installed, the Urchin Traffic Monitor is triggered each time someone views a page from the website. The UTM Sensor uniquely identifies each visitor and sends one extra hit for each pageview. This additional hit is very lightweight and most systems will not see any additional load. The Urchin engine identifies these extra hits in the normal log file and uses this additional data to create an exact picture of every step taken by the users. This method also identifies visitors and sessions uniquely so that return visitation behavior can be properly analyzed. While this method takes a little extra time to configure, it highly recommended for comprehensive detailed analytics.

Urchin Traffic Monitor (UTM)

Overview The patent−pending Urchin Traffic Monitor (UTM), originally available in Urchin 4, was specifically designed to provide the most accurate measurements of unique website visitors. For businesses looking to get a deeper understanding of their online visitor behavior, the UTM is an extremely valuable technology that combines the best of client and server side information while letting you control the data. Easy to install, this technology allows business owners to exactly identify unique visitors, click paths, and return loyalty metrics including: first time visitors, returning visitors, and frequency of use. The second version of UTM, UTM−2, released with Urchin 5, expands these capabilities, capturing additional browser parameters and loyalty metrics. UTM−3, released with Urchin 5.5, adds a powerful campaign tracking capability. Subsequent versions of the UTM released with Urchin 5.6, and Urchin 5.7 contain a number of enhancements to the campaign tracking capability. There are two components to the Urchin Traffic Monitor System: the UTM Sensor, which is a lightweight Chapter 2: Visitor Tracking

38

module installed into the content of the website; and the UTM Engine which is part of the log processing Urchin Engine. The UTM Sensor enables client−side data collection, which is then funneled back through the web server augmenting the normal logfile. The client−side information is combined with the existing server−side data by the UTM Engine to provide a more accurate and complete picture of website activity. The UTM Sensor is a small amount of JavaScript code that accomplishes two important functions. First, the Sensor negates the effects of caching by forcing at least one hit to progress to the original web server for each pageview. The impact on the server is minimal, and the details about the additional hit are logged into the normal web logfiles resulting in a more complete data set. Secondly, the UTM Sensor uniquely identifies each visitor by using client−side "1st party" cookies to keep track of first time and returning visitors. This cookie identifier is a communication tag only viewable to your web server in the same nature as session ids. It is not a third party cookie, which provides information outside your system, violating many privacy policies.

The above diagram illustrates the operation of the UTM System. The web server in the middle of the diagram provides two basic functions: content delivery and logging. The content of the website includes the UTM Sensor which is delivered to the user’s browser, shown on the left. The UTM Sensor sets unique identifiers and sends an additional request back to the same web server. This additional request is logged into the normal log file along with all of the normal traffic. The UTM Engine, which is part of the Urchin log processing engine, understands this additional data and merges the two types of data together providing an accurate and more complete picture of visitor behavior. UTM Sensor The UTM Sensor increases the accuracy and completeness of logfile data by negating the effects of caching and proxying. The following example illustrates how the UTM Sensor handles caching. Shown in the figure below, the user receives the content of a pageview from the cached memory of the browser. This typically occurs when the user goes back to a previously viewed page. The same model applies if the caching is provided by a service provider. In the example, the content for page "X" is not delivered from the web server, but from the cached memory of the browser. At this point, there is no knowledge of the pageview as it is not seen by the web server. However, the UTM Sensor activates an additional unique hit that forces at least one small record back to the original web server. This information is logged in the normal logfile, which now has knowledge of the originating "X" pageview.

Chapter 2: Visitor Tracking

39

The second important function of the UTM Sensor is to uniquely identify both sessions and unique visitors. Through a patent−pending combination of browser cookies, the Sensor detects and initializes the unique visitor and session identifiers allowing exact monitoring of new and returning visitors regardless of service provider proxy behavior. Most service providers take advantage of proxying by recycling IP addresses and clustering users behind firewalls. This can cause problems with normal logfile tracking, which typically utilizes the IP address as an identifier of the user. In the example shown in the figure below, the UTM Sensor is able to pierce the veil of the proxy by utilizing the cookie identifiers instead of the IP addresses. In the figure, a first time unique visitor accesses the website through a firewall with IP address #1. The delivered Pageview contains the UTM Sensor, which sets the identifier on the visitor’s browser. On the return visit by the same visitor shown in the bottom of the figure, the unique id is passed to the web server along with each request. So even if the user is now assigned a second IP address, the UTM technology properly identifies the visitor with the original id. In addition to negating the effects of complex proxying techniques, this also tracks visitors who travel and may use their laptops from several locations and through several providers.

Once the additional UTM data is recorded in the normal web server log , the UTM Engine will recognize and process these additional hits in order to create an exact analysis of each click of the user. During installation, it is important that the logging format is checked for both referral and cookie logging to be present so that all of the appropriate data is stored. Installation Chapter 2: Visitor Tracking

40

There are four steps to installing the UTM system, which can be accomplished in a very short amount of time. Complex sites may be able to take advantage of existing server−side includes or centralized delivery methods to shorten the installation. During installation, you will need access and permissions to modify the content of the website. You may also need to modify the logging of the web server, which may require a different set of permissions. The following four steps do not necessarily need to be performed in order. Upgrade Note: UTM−2, which ships with Urchin 5, is not recognized by Urchin 4. Once UTM−2 is installed, you will no longer be able to run Urchin 4. All versions of Urchin 5 recognize both UTM−1 and UTM−2. As well, as of Urchin 5.5 there is UTM−3. Only Urchin 5.5 and up can process UTM−3 data. Therefore, when upgrading, it is important to migrate to the appropriate version of Urchin before installing a more recent version of the UTM sensor. UTM−4 data, however, can be processed by any Urchin 5.x version. 1. Install UTM Sensor into content: The first step in installing the UTM is to include the JavaScript and GIF components of the UTM Sensor in the content of the site. The two pieces necessary for completing this step are included in the util/utm/ folder within the Urchin distribution. It is important that the names of these two files are not changed and that they are copied to the document root directory of the website. Either drag and drop, upload, or copy the __utm.js and the __utm.gif files into the main directory of your website.

Once these files are in place, you will need to include the __utm.js file at the beginning of each webpage in the website. If your site utilizes server side includes and you use a header include for each file, it is possible to include the UTM in the beginning of this include file only. It will then automatically be a part of each webpage. For static HTML sites that do not use includes, you will need to modify and add the UTM entry to each page individually. For dynamic sites that use a content generation engine, the UTM can be included at the beginning of the template that is delivered to the customer. In any case, the following line of code should be included in the beginning of each HTML page, but after any META tags, that is delivered to the end user. For static sites, edit each webpage and add the line below before the rest of the HTML content (but after any META tags).

For sites that undergo regular maintenance or have multiple authors, be sure to build the addition of this line into the your internal website authoring procedures, guidelines, and QA processes. If you are using a package like "HTML Tidy", you may want to include the Javascript line in the HEAD area of your page to make it more palatable, for instance: ...

2. Set UTM Domain (if necessary): The UTM (beginning with UTM−2) has a domain setting that controls the scope of the cookies. For single websites, the default setting, "auto", can be left alone. If you have multiple websites that share a common root domain and you wish to process them together, then the domain should be set to the common root domain. To set the domain setting, edit the __utm.js file that was copied into your document root in step 1. Towards the top, you will see the line: var __utmdn="auto";

/*−− ...

Change the word "auto" to the domain that the cookies should apply to. The domain must be part or all of the actual URL for this site. Example: var __utmdn="urchin.com";

/*−− ...

3. Activate cookies in the logging: The third step to installing the UTM System is to verify and potentially modify the logging format of your web server. For the UTM to function properly, it is required that both referral and cookie information is logged. You will need access to the configuration of the web server. The following general guidelines should work for most IIS and Apache users, however you should check with your system administrator to ensure proper formats. For Apache Users: Apache servers typically use a configuration file called "httpd.conf." Within this file, configuration directives determine the format and location of logfiles. By default, most Apache configurations will log in the NCSA Extended Combined format, which includes referrals and user−agents, but is missing the cookie information. Be sure that your logfiles contain the "{Cookie}i" field specification. To modify your logging format from the default, a "special" LogFormat directive can be added and then the log files can reference this format using the CustomLog directive.

The above example is provided as a reference and does not apply to all possible Apache settings. Please refer to the Apache documentation and consult with your system administrator on the actual directives needed to activate cookie logging. The LogFormat directive specifies the specific format of the log file. The example shows the addition of the cookie information to the end of the log file. This format is then named "special" so that it can be identified in the virtual host configurations. The CustomLog directive in the virtual host Chapter 2: Visitor Tracking

42

specification identifies the location of the log file and the format to use. The example uses the "special" format as defined previously. For Microsoft IIS users: The Internet Services Manager provides a point−and−click interface for adjusting the web server configuration. To access this manager, you will need to login to the web server with the appropriate administrator privileges. To access the Internet Services Manager, click on the "Start" menu −− > Settings −−> Control Panel, and then double click on the Administrative Tools Folder and then on the Internet Services Manager icon to open the manager.

Modifications to each website can either be made individually or the entire server can be modified. In the left window either right−click on the server name to modify the entire server, or right−click on the website name such as "mysite1.com." Select the "Properties" option to open the properties dialog box. For the entire server, click on the "EditE button with "WWW Services" selected in the menu to bring up the Properties dialog box shown on the left below.

Shown in the above figure, be sure that logging is enabled and set to the "W3C Extended Log File Format." Then select the "PropertiesE button to configure the log file format specifics.

Chapter 2: Visitor Tracking

43

The window shown above will appear. Click on the "Extended Properties" tab, scroll down and make sure both the "Cookie" and "Referrer" boxes are checked. If not, check these boxes and "Apply" the changes to the site. Whether you use IIS, Apache or another web server, please refer to your server documentation for more information on configuring logfile formats. All major web servers support the logging of cookies and are easily modified to activate this feature. 4. Set Urchin configuration to UTM: The final step in configuring the UTM for your site is to enable the UTM tracking in the Urchin Configuration. This is either done at the time the Profile was created or after by editing the Profile. Open the Urchin Configuration either directly on the machine or by logging in remotely as the "admin" user. Your installation instructions will provide more details on how to access the configuration. Once open, Click on the "Configuration" icon to the left to provide a list of the existing Profiles in the configuration. To enable UTM tracking for a particular Profile, click the "Edit" to the right of the profile name. (Note: if you have not already added the profile, do so now using the "Add" button). After clicking on the "Edit" button, click on the "Reporting" tab to bring up the Reporting Settings Window.

Under the "Visitor Tracking Options" section, use the menu to select "Urchin Traffic Monitor (UTM)" for the Visitor Tracking Method. If you explicitly set the UTM Domain in step 2, then set the UTM Domain setting in the above figure to the same value as in step 2. If you did not specify the domain in step 2, then set the Chapter 2: Visitor Tracking

44

UTM Domain to the address of your website without the "www.". If your website domain does not start with "www.", then use the whole thing. Click the "Update" button to save your settings. That’s it. The installation is complete, and future traffic will contain and benefit from the the UTM System.

Session−ID Identification

Overview Many application servers including ASP pages will use a unique session number to identify individuals currently on the site. And while this information doesn't usually contain any historical tracking, it does provides an accurate way of identifying unique sessions. Session IDs are typically located in either the URL query parameters or in a Cookie that is assigned to the user. As long as this information is logged into the Log File, Urchin can use this to uniquely identify each session. Using Session IDs increases the accuracy of reporting by defeating the effects of proxy servers. Using Session IDs does not provide unique visitor tracking like the UTM system, but if you already have Session IDs in place, it can be an easy way to increase the session accuracy immediately. Session ID Location Before configuring Urchin to use Session IDs, check your log file to make sure the IDs are coming through and make a note of the field and format. If the Session IDs are in the request, then the 'request_query' field will contain the variable string. If they are in the cookie field, then the 'cs_cookie' field should be used.

Make a note of the field and the variable name used to mark the identifier. If you don't see the ID in the Log File and you are sure you are using Session IDs, check to see that the logging format contains the appropriate field. Urchin Configuration Once you have the Session ID information, you can easily set your Profile in Urchin to use this for visitor identification. Bring up the Urchin Administration Interface and under Configuration, click on Edit next to the Chapter 2: Visitor Tracking

45

Profile you wish to configure (or click Add if you don't have the Profile configured yet). Click on the Reporting Tab to bring up the "Visitor Identification" settings:

Shown in the above image, change the Visitor Tracking Method to "Session ID" and set the Session Field to either request_query or cs_cookie as determined above. Then enter what comes before and after the Session ID in the two Parsing boxes. For the first example provided at the beginning of this document, sid=12345,enter "sid=" and ""in the two boxes. Click update and you are ready to go.

UTM Quick−Install (Apache)

The following is intended as a quick run−through on installing the UTM for websites running on an Apache server on all platforms except for Sun Cobalt. For more detailed information on the UTM, please see the article entitled "Urchin Traffic Monitor (UTM)" found in this section. Step 1: Copy UTM files to website document root. The files, __utm.js and __utm.gif are located in the "util/utm" directory in the Urchin distribution. Copy these two files to the main directory of your website content. IMPORTANT: the filenames start with two underscore characters. Step 2: Reference UTM in your HTML. Enter the following line in all of your HTML pages. While it can go anywhere in the pages, we recommend putting it in the section. If you use a common include or template, you can enter it there. IMPORTANT: the filename starts with two underscores.

If you are using a package like "HTML Tidy", you may want to include the Javascript line in the HEAD area of your page to make it more palatable, for instance:

Chapter 2: Visitor Tracking

46

...

Step 3: Enable cookies in your Apache logging. If not already enabled, you can use the following httpd.conf example to enable cookie logging:

Step 4: Set Urchin Profile to use UTM. In the Urchin Administration interface, edit the profile in question and click on the Reporting tab. Set the Visitor Tracking Method to UTM. Set the UTM Domain to the address of your website without the www. When done click the Update button. Then click on the Profile Settings tab and choose UTM−Enable All for the Default Report Set, then click Update again. That's it! Your website will now begin logging UTM data into your normal log file which will be identified the next time you run Urchin. Is it working? To see if the UTM is successfully making entries to your log file, examine the log after you have installed the UTM and clicked on a few pages of the site. You should see an entry similar to the following at the end of the log file: ... "GET /__utm.gif?..." 200 ..."__utma=..."

If you don't see the __utma entries, be sure to check that cookies was enabled in the logging properly. If the status code is not 200 then check to make sure the files were properly copied to your document root.

Installing UTM On Every Page (Apache)

Installing the UTM sensor on every page of a web site allows Urchin to provide the most accurate analytics possible. This article describes how to easily install the UTM sensor on every page of a large site. How can I install UTM on every page? mod_layout is an Apache module that provides both a Footer and Header directive to automatically include output from other URIs at the beginning and ending of every web page. You can use it to include the __utm.js calls on every page of a site. It is an invaluable tool for service providers who do not wish to modify their clients' web pages, as well as for single sites with a large number of web pages. To install mod_layout: Chapter 2: Visitor Tracking

47

1. Download mod_layout from tangent.org 2. Extract the compressed file and read the README. 3. Install mod_layout as described in INSTALL 4. Create an html file called utm.html 5. Add to utm.html 6. Modify your current Apache configuation file to include the utm.html file. Example ServerName urchin.com ServerAlias www.urchin.com LayoutHeader /path/to/file/utm.html ...

UTM Quick−Install (IIS)

The following is intended as a quick run−through on installing the UTM for websites running on a Microsoft IIS server on any Windows platform. For more detailed information on the UTM, please see the article entitled "Urchin Traffic Monitor (UTM)" found in this section. Step 1: Copy UTM files to website document root. The files, __utm.js and __utm.gif are located in the "utils\utm" folder in the Urchin distribution. Copy these two files to the main folder of your website content. IMPORTANT: the filenames start with two underscore characters. Step 2: Reference UTM in your HTML. Enter the following line in all of your HTML pages. While it can go anywhere in the pages, we recommend putting it in the section. If you use a common include or template, you can enter it there. IMPORTANT: the filename starts with two underscores.

If you are using a package like "HTML Tidy", you may want to include the Javascript line in the HEAD area of your page to make it more palatable, for instance: ...

Step 3: Enable cookies in your IIS logging. Open the IIS Manager and bring up the Properties window for your website. Make sure the logging is enabled and set to the W3C Extended format. Click the Properties Chapter 2: Visitor Tracking

48

button next to the format and under the Extended Properties Tab, check the box next to Cookie. Step 4: Set Urchin Profile to use UTM. In the Urchin Administration interface, edit the profile in question and click on the Reporting tab. Set the Visitor Tracking Method to UTM. Set the UTM Domain to the address of your website without the www. When done click the Update button. Then click on the Profile Settings tab and choose UTM−Enable All for the Default Report Set, then click Update again. That's it! Your website will now begin logging UTM data into your normal log file which will be identified the next time you run Urchin. Is it working? To see if the UTM is successfully making entries to your log file, examine the log after you have installed the UTM and clicked on a few pages of the site. You should see an entry similar to the following at the end of the log file: ... "GET /__utm.gif?..." 200 ..."__utma=..."

If you don't see the __utma entries, be sure to check that cookies was enabled in the logging properly. If the status code is not 200 then check to make sure the files were properly copied to your document root.

Using UTM with Domain Aliases

Background Because cookies are domain based objects, there are some important considerations when a site has multiple domains. A cookie that is set under a domain, "mysite.com", will be passed to all subdomains such as "www.mysite.com". However, this cookie will not be passed to "mysite.net" or any other different root domains. If your website only has one domain responding to "mysite.com" and "www.mysite.com", you can follow the standard UTM installation. However, If you have a website with one or many aliases, it is recommended to redirect traffic from the aliases to the primary site. This will ensure that the UTM visitor tracking is getting set under the primary domain and that all visitors are tracked consistently. If we don't do this, then a visitor may appear as two visitors if they access the same site through two separate domains. The following instructions provides an example of how to redirect aliased domains to the primary domain in Apache and IIS servers. Redirecting Aliases in Apache If you are using an Apache webserver, the configuration can be easily modified to redirect all traffic originating under one of the aliases to the primary site. One way to do this is to create two VirtualHost entries. The first will be the primary domain which will include your normal configuration; and the second VirtualHost will be for all the aliases and will redirect to the primary. Example: #−−−primary virtualhost Servername www.mysite.com

Chapter 2: Visitor Tracking

49

Serveralias mysite.com ... #−−−second virtualhost Servername mysite.org Serveralias www.mysite.org mysite.net www.mysite.net RewriteEngine on RewriteRule ^(.*) http://www.mysite.com$1 [R=301]

The second VirtualHost uses a rewrite rule with a 301 (Moved Permanently) redirect code to forward all traffic to the original site. A single 301 hit will still be recorded in the log file which is nice for tracking which domains people are entering on, but all remaining traffic will be forced under the one domain. At this point, as far as the UTM is concerned, the site appears to be a one domain site and is ready for normal UTM installation. Note: please be advised that you should work with your administrator and reference the apache.org site on configuration parameters. Redirecting Aliases in IIS If you are using a Microsoft IIS webserver, the configuration can be easily modified to redirect all traffic originating under one of the aliases to the primary site. One way to do this is to create two websites in the IIS configuration. The first will be the primary domain (www.mysite.com) which will include your normal configuration; and the second will be for all the aliases (mysite.net, mysite.org, etc) and will redirect to the primary. In the IIS Manager, right click on one of the websites and bring up the properties dialog. On the "Web Site" tab, click the "Advanced..." button. This brings up the window where additional domains can be assigned to the website using the "Host Header" field. Set the primary domain in the primary website, and use the second website to house all of the aliases. Once the second website housing all of the aliases is configured and enabled, create a blank homepage with the following redirect code:

This will instruct the visitor's browser to immediately redirect to the primary URL. At this point, the primary website appears to be a simple one−domain configuration, and normal UTM installation can proceed with default settings.

Using UTM with Multiple Sites

Multiple Sites − Same Root Domain Chapter 2: Visitor Tracking

50

Multiple sites with the same domain (e.g., www.urchin.com and help.urchin.com) can either be processed together or separately, depending on the UTM Domain setting of the two sites. If the UTM Domain is set to the default, "auto", then the two sites will be processed separately. This means that Visitor tracking information will be kept separate for each site. Visitor reporting for one site will not be affected by visitor traffic to other site. Process Together If you wish to process the sites together, sharing Visitor tracking information, then the UTM Domain can be explicitly set to the common domain (e.g., urchin.com). You will need to set this in the UTM code and the Urchin configuration. To set this in UTM code, edit the __utm.js file in the document root of each site. Towards the top, you will see the line: var __utmdn="auto";

/*−− ...

Change the word "auto" to the common domain: var __utmdn="urchin.com";

/*−− ...

Next, in the Urchin configuration, create a single Profile with UTM activated, and set the UTM Domain to the common domain. You will be processing the logs from both sites in the same Profile. In processing the logs for two sites together, it is recommended to apply a Filter to one of the logs in order to distinguish pages and paths. For the www.urchin.com and help.urchin.com example, inserting '/help' in front of the URLs for help.urchin.com log will allow you distiguish between http://www.urchin.com/foo.html and http://help.urchin.com/foo.html. The resulting pages will be referenced as "/foo.html" and "/help/foo.html", respectively. Create a search and replace filter on the 'request_stem' field with the following settings: Filter Field: request_stem Search String: ^/ Replace String: /help/

In our example, this filter would then be applied to the log file for help.urchin.com. Running the two separate Log Sources together will require an additional Load Balanced Server Module in the license. Please contact your sales representative for details.

Tracking Flash and Browser Events (UTM−5 only)

You can track any browser based event, including Flash and Javascript events, if you have installed the UTM−5 (available at ftp://ftp.urchin.com/urchin5/utm−5/) on your website. To track an event, call the urchinTracker JavaScript function with an argument specifying a name for the event. For example, calling: javascript:urchinTracker('/homepage/flashbuttons/button1'); Chapter 2: Visitor Tracking

51

will cause each occurrence of the the calling Flash event to be logged as though it were a pageview under the name /homepage/flashbuttons/button1. The argument must begin with a forward slash. The event names may be organized into any directory style structure you wish. For example, if you wish to organize flash events by page, by type of event, you might organize a hierarchy along these lines: • '/homepage/flashbuttons/button1' • '/homepage/clips/clip1' Flash Code Examples on (release) { // Track with no action getURL("javascript:urchinTracker('/folder/file');"); } on (release) { //Track with action getURL("javascript:urchinTracker('/folder/file');"); _root.gotoAndPlay(3); myVar = "Flash Track Test" }

onClipEvent (enterFrame) { getURL("javascript:urchinTracker('/folder/file');"); }

HTML Code Examples The following illustrates how to log an onClick event:

The following illustrates how to log a rollover event:

Tracking Banner Ad Exits and Other Outbound Links

If you publish advertising banners on your site, there is an easy way for you to track which banners visitors click on to leave your site and which advertisers they visit. First, make sure that you have installed the UTM−5 (available at ftp://ftp.urchin.com/urchin5/utm−5/) on your website. Next, you will need to add some code to each of the banners. For an animated GIF or other type of static banner ad, you would add the following code:

This code causes each click on the banner to be logged as though it were a pageview named /bannerads/advertisername/bannername. It is a good idea to log all of your advertising banners into a logical directory structure such as /bannerads/the name of the advertiser/the name of the banner. This way, you will be able to easily identify the number of referrals to each advertiser. The equivalent code for a Flash banner is provided below: on (release) { getURL("javascript:urchinTracker('/bannerads/advertisername/bannername');"); getURL("http://www.advertisersite.com"); }

Chapter 2: Visitor Tracking

53

Chapter 3: Urchin Administration

Administration Overview

Introduction The Urchin Administration Interface is a browser−based command center from which you can control virtually everything related to running Urchin, including setting up Profiles, scheduling log processing events, managing Users and Groups, configuring Filters, and much more.

Chapter 3: Urchin Administration

54

To get started, login to your Urchin system using a browser. If the default port was used during installation, then the URL should be http://your.server.com:9999/, replacing 'your.server.com' with the actual name of the system Urchin is running on. Alternatively, http://localhost:9999/ can be used if you are directly on the system. On Windows platforms, there is an 'Urchin Administration' shortcut in the Start menu. The default password for the 'admin' account is 'urchin'. Be sure to change this to a more secure password. Controls After logging into the system and proceeding through the startup wizard, you will see the administration screen with a menu on the leftside navigation. The three primary buttons are 'View Reports', 'Configuration', and 'Preferences'. Click on the 'Configuration' button to begin configuring Urchin.

This menu provides access to all of the critical configuration controls. Click on the arrows to expand a particular section. The darkened color indicates which control is currently being displayed. When first clicking on one of the configuration sections, a list of existing entries may be shown with appropriate 'edit' buttons next to each entry.

Clicking on the 'Edit' button next to a particular entry will allow you to modify the configuration for the entry. To add new entries, click the 'Add' button in the upper right shown in the above figure. After clicking 'Edit' on a particular entry, the set of configuration screens available for that entry is shown using tabs across the top to select the particular configuration subject.

Chapter 3: Urchin Administration

55

Click on a particular tab to access the configuration settings under the tab. After changing any settings, be sure to click the 'Update' button provided at the bottom of each screen. Once you have a long list of entries in a particular area, there are some additional controls that make it easier to find those entries. The Next/Previous buttons are located just above and below the list of entries for scrolling through the entries. The number shown can also increase how many entries are shown at one time

Shown in the above image, the + − Filter option can help you quickly find a particular entry. Simply enter all or part of the entry's name and press return. Details about each section are provided further in this manual and by clicking the 'Help' link provided at the bottom of each admin screen. Definitions about each configuration parameter are generally found by clicking the 'Help' link.

Profiles Importing Profiles (Windows)

Chapter 3: Urchin Administration

56

Overview Urchin's Import Profiles function is a convenient way for users with systems running the Microsoft Internet Information Server to set up Profiles for each of their IIS sites quickly. Urchin can read the IIS configuration, determine what websites are running on the server, and then build basic Profiles for each website that use the IIS logs as their Log Sources. You can then customize the Profiles or add additional Profiles as desired for the imported sites. How to Import Profiles To get started importing Profiles, log−in to the Urchin administration system as admin and click on the Configuration button at left. Click the Import button at top−right and you will be taken to the Import Profiles screen. This screen allows you to select which, if any, websites to import. Once you've checked sites to import click the Import button. Click Done when you've finished with all your import choices.

Recommendations It's a good idea to create at least one Profile for each website on the server so that you get a complete picture of traffic to the server via Urchin's Summary Report. The Summary Report gives you overall traffic information for the server, as well as a ranking of each site by various traffic parameters. This is very handy if you are a host and bill according to bandwidth usage. Note that the Summary Report only shows data based on Profiles that have been configured −− sites without functioning Profiles are not included.

Working with Profiles

Overview A Profile is the term used for a set of reports for a website and the configuration settings needed to create those reports. In general, you will need to set up a Profile for each website for which you want reporting. If needed, multiple Profiles can be used for the same website with different filtering options. The configuration of a Profile includes information about the website, log file sources, filters, and the schedule for processing. Once a Profile is created and configured, it needs to be 'Run' in order to process raw Chapter 3: Urchin Administration

57

log file data. Licensing Information The Urchin base license includes 100 Profiles. If you need more Profiles, the license can be upgraded by contacting your sales representative or by clicking on the Settings −−> License −−> Upgrade link within the configuration. The base license also includes one server per Profile. If you need additional Load Balanced Servers, you will need to upgrade your license. Creating a Profile To get started creating a Profile, login to the Urchin administration interface as an Urchin admin and click on the Configuration button at left. To create a new Profile, click the Add button at top−right as shown in the image below. You will be taken to the Add Profile Wizard. This is a simple series of steps designed to help you get the Profile set up in basic form quickly and easily. Each screen in the Wizard has explicit help information that is available by clicking on the ? icon.

Once a Profile is created, the configuration can be modified by clicking the 'Edit' button next to that entry in the list. Tabs are provided at the top of the configuration area to easily access the different configuration screens. Recommendations • Urchin has several different methods for identifying visitors and sessions, depending on available information. Of these, the patent−pending Urchin Traffic Monitor (UTM) is a highly accurate system that was specifically designed to identify unique visitors, sessions, exact paths, and return frequency behavior. There are a number of visitor loyalty and client reports that are only available when using the UTM System. The UTM System is easy to install and is highly recommended for all businesses. To install UTM, please refer to the UTM install instructions in the Visitor Tracking section of this documentation. • If you intend to set up one or more Filters in conjunction with your Profile, it is advisable to have more than one Profile for that website or part thereof. We recommend having one Profile that is the "master" −− it contains everything. If you wish, for example, to filter out spiders or robots, it's a good idea to put these Filters in a second Profile so you can easily compare the results of the Filters to the master Profile.

Chapter 3: Urchin Administration

58

Log Files Working with Log Sources

Overview You will generally add a Log Source in the course of creating a Profile. A Log Source is Urchin's way of identifying the characteristics of an access log (sometimes called a transfer log) for one of your websites. Access logs contain all the hits, or requests for web documents, that are made to your website. Some of the log file characteristics that are associated with a Log Source are the path to the log file, the format of the log file (e.g. W3C or NCSA), whether the log is local or on a remote system, and whether a filter should be applied to the log file during processing. An important concept to understand is that Log Sources exist independently of Profiles. Every Profile must have at least one Log Source associated with it to obtain reporting. However, several Profiles could conceivably use the same Log Source. For example, you may want to create multiple Profiles using the same Log Source, but give each Profile a different filter to produce varying report results. So there is not necessarily a 1:1 ratio between Log Sources and Profiles. Configuring Log Sources To get started adding a Log Source to the system, log−in to the Urchin administrative system as the administrator and click on the Configuration button at left. Next, click the Log Manager button. To create a new Log Source, click the Add button at the top right of the screen. You will be taken to the Add Log Source Wizard. This is a simple series of steps designed to help you get the Log Source set up quickly and easily. Each screen in the Wizard has explicit help information to explain the configuration information displayed on that screen. In the Log Settings screen you will note that you have to choose a Log Format. This setting tells Urchin how the data in your log file is arranged. It is important that you select the correct format for your log or Urchin will not be able to produce meaningful report data. Urchin understands a default set of log formats that you can choose from via a dropdown menu. They are: • Auto: Urchin uses this format to automatically detect NCSA, W3C, Netscape, ELF, and ELF2 log formats. Instead of explicitly selecting one of these, you may choose Auto and Urchin will correctly deduce how to read the data if your log format is in this list. • NCSA: Apache modified Extended/Combined format (see Logging − Apache and IIS for a description of this format) • W3C: Microsoft IIS servers typically use this format, although other webservers can also be configured to produce W3C logs. • Netscape: Netscape and iPlanet servers use this format by default.

Chapter 3: Urchin Administration

59

• ELF/ELF2: E−Commerce Log Format; see the specification in the E−commerce Module section for details. • Google: If you have licensed the Campaign Tracking Module, use this format for logs containing Google cost−per−click spending data. Note that the Google log format can not be auto detected. • Overture:If you have licensed the Campaign Tracking Module, use this format for logs containing Overture cost−per−click spending data. Note that the Overture log format can not be auto detected. • Custom: Although not initially listed in the dropdown menu, you can create your own custom log formats, which will automatically appear in the dropdown menu when properly configured. Please refer to the "Custom Log Formats" article in the Advanced Topics −> Customization section of the Documentation Center. If you don't believe your webserver currently produces logs in one of the recognized default formats, then either you can reconfigure your webserver to log in one of these formats, or you can create a custom log format that conforms to how your webserver currently logs. If you want to reconfigure your webserver logging, then it is recommended that you choose the W3C or NCSA style logging. Load Balancing and Parallel Log Processing If you have purchased a Load Balancing License, the Log Source Wizard provides a Parallel Log Processing option. When Parallel Log Procesing is enabled, Urchin opens all of the log files at once and reads them in a rotating fashion, one section at a time, each section corresponding to 15 minutes of log activity. Enabling Parallel Log Processing significantly increases performance on load balanced sites.

Log Management

Overview Log management is an important concern when running software such as Urchin. Because busy sites will build up large log files fairly quickly (up to several gigabytes in one month in some cases), log management should be considered carefully. It is recommended that a standard log rotation practice be established. Compressing and otherwise archiving files offline are standard practices. Please see the article on Log Rotation Best Practices in this section for further information on establishing such a procedure. Log management is necessary only for disk resource usage considerations, not for purposes of avoiding reprocessing data. Urchin does not need any sort of log rotation to avoid data duplication, as it is equipped with a log tracking capability that ensures that previously read log data is not reprocessed. Because Urchin should never need to re−read a log file once they have been processed, at your discretion you may delete the log(s) after each processing run. However, it is not uncommon to keep old logs for a specified amount of time for historical or auditing reasons. Managing Logs via Urchin Each Log Source has a Log Destiny setting with the options Don't Touch, Archive/Compress, and Delete. Once all Profiles that are utilizing a Log Source have finished their processing, Urchin uses the Log Destiny setting to determine the disposition of the Log Source. The Log Destiny setting is accessible under the Advanced Settings tab for a given Log Source. It is recommended to set Log Destiny to Archive/Compress so Chapter 3: Urchin Administration

60

that you save disk space if you want to keep your logs for some period of time. If you are comfortable with the fact that once you've processed a log that it is removed, then you can choose a Log Destiny of Delete. However, realize that this means you will not have the option of rerunning Urchin against that log in the future unless you have a backup elsewhere. Considerations A few special situations should be noted: • Do not use the Archive or Delete options with a Log Source if you are processing live logs. A live log is one that is being actively written to by a webserver. Using these setting with a live log will cause a loss of data. • If Log Destiny for a remotely retrieved Log Source is set to Don't Touch, then that log will grow continually unless there is some process external to Urchin that is handling log management on the machine where the log is created. Since Urchin must transfer a copy of the remote logfile to the local system before processing, as the log file grows it will take Urchin longer and longer to transfer the file. This will have the side effect of lengthening your overall Urchin run time.

Log Rotation Best Practices

Overview It is very typical in most operating environments for the system services and applications such as webservers to generate logfiles that record actions and events related to those services. In most cases, it is also standard practice for the operating system and/or applications to perform regular maintenance on the logfiles to keep the size of the logfiles in check. This prevents the logfiles from growing without bounds and eventually running out of disk space. A common approach to managing logs is to have a regularly scheduled log rotation task that renames the existing logs with a timestamp and then restarts the service or application with a new, zero length logfile. It is also a standard practice for the log rotation task to compress the old logfiles, and to delete logfiles after a certain age or rotation cycle threshold has been reached. In the specific case of webserver logs, the rotation is usually handled on a daily basis to ensure that the logs remain at a manageable size. In addition, a daily rotation schedule is generally a good granularity to facilitate post−processing of webserver logs with an analysis tool like Urchin. Some webservers such as Microsoft's IIS have built−in log rotation functionality, which, when enabled, will rotate logs on a daily basis by default. Other webservers such as Apache have no explicit log rotation handler, but provide tools for easily restarting the webserver (without loss of web service) to accommodate the log rotation operation (e.g. apachectl restart ). Log Rotation in Previous Versions of Urchin Chapter 3: Urchin Administration

61

Unlike Urchin 4 and 5, previous versions of Urchin have no built in log tracking mechanism to determine which logs have already been processed, so those earlier Urchin versions depend heavily on a reliable log rotation scheme to ensure that logs are only processed a single time. As such, pre−Urchin 4 versions have the option of providing simple log rotation functionality and the ability to restart the webserver as part of the overall processing duties. If this Urchin logrotation mechanism is not utilized, the responsibility of reliable log rotation must be handled completely by an external log management mechanism. This has traditionally been the function of a larger overall system log management scheme provided as part of the operating system (e.g. the open−source "logrotated" found in many Linux distributions). Log Rotation Practices with Urchin 5 With the advent of Urchin 4, the need for log rotation to avoid duplicate processing of logs has been eliminated thanks to Urchin's Log Tracking technology. This allows Urchin 5 much greater flexibility in processing of logs, such as the ability to process "live" logs that are still being written by the webserver, or to process logs that are rotated on an manual or irregular basis. Important Note: Unlike previous versions of Urchin, Urchin 5 does not provide hooks for invoking a log rotation procedure or restarting a webserver after log rotation tasks have been performed, although certain post−log−processing actions are possible as described below. While Urchin 5 operation does not require that webserver logs be rotated regularly or at all, it is recommended that a standard log rotation scheme be implemented to ensure smooth operation and to keep the Log Tracking utility from having to do a lot of unnecessary processing. It is much more efficient from both a system and application standpoint to manage several smaller logs than one very large log, as file operations tend to slow considerably as files get larger. Smaller files are also much easier to back up and restore in the event of a disk failure or other system failure. Log rotation mechanisms needn't be overly complex −− in most cases, a simple shell script or Perl script run daily from cron on UNIX−type systems is all that is necessary. The script merely needs to rotate the existing webserver log and timestamp it (using the %Y%m%d or YYYYMMDD formats is recommended), and restart the webserver. Additional logic can be added to prune old logfiles to keep disk space usage in check. A sample log rotation script written in Perl can be downloaded from http://www.urchin.com/support in the Helper Scripts area. This script rotates one or more logs and timestamps them appropriately, then removes logs that are older than a certain number of days (configurable). Note: If you are running IIS on a Windows system, the log rotation functionality is included as part of the IIS management and no external script is needed. Configuring Urchin 5 for Use with Log Rotation Once you have your log rotation scheme in place, it is a simple matter to configure Urchin to process your rotated log. You can either set up the Log File Path specification to use a wildcard which matches the time−stamped log filename pattern when configuring a Log Source (.e.g. access−log.* for Apache logs or ex*.log for IIS logs) or you can use Urchin's built−in timestamp pattern matching (e.g. access−log.%Y%m%d for Apache, ex%y%m%d.log for IIS). When Urchin encounters this pattern, it will substitute yesterday's date for the %Y%m%d pattern and process the log with the resulting filename (e.g. access−log.20020617). For further information on the date matching pattern, please see the article in this section entitled Wildcard &Date Substitution in Log Paths.

Chapter 3: Urchin Administration

62

The wildcard specification has the advantage of allowing you to place a number of unprocessed logs in a single directory and have Urchin process them the next time it runs. This is especially convenient for handling situations where the expected logfiles are not in place when Urchin runs, e.g. due to a remote webserver being down or loss of network connectivity. The disadvantage is that Urchin must open up the directory and search each log file to determine if it has already been processed, and this can induce significant overhead when many log files are resident in the directory. If you deem your log rotation scheme to be reliable, using the YYYYMMDD pattern matching scheme is a more efficient method. You may also wish to have Urchin 5 delete or archive/compress the log once it has been processed. Different Log Destiny options can be set in the the Advanced Settings of a Log Source. For more information on these Log Destiny settings, please see the Log Management document in the Log Files section of the Urchin Administration area. Important! Log Destiny options should not be used with live logs that have not been rotated! Configuring Log Rotation on UNIX−type systems Due to the large variation operating system functionality and webserver configurations, and the high likelyhood that log rotation procedures are highly site−specific, there is no cookbook method for establishing webserver log rotation on UNIX−type systems. However, a sample log rotation script called WebLogRotate is available from the Urchin web site in the Helper Scripts area. This script is written in Perl to make it as portable as possible, and is typically invoked from cron on a daily basis. Configuring Log Rotation for Windows IIS Webservers As mentioned above, the management functions of IIS allow for automatic log rotation of webserver logs, though this functionality is not enabled by default. Please follow the steps below to configure an IIS webserver for proper log rotation. It is recommended that the logs be rotated daily, and that the log rotation be set to happen in relation to local time. By default, IIS will rotate logs at midnight GMT rather than localtime. Under Windows 2000, you should insure that IIS webserver is configured properly to do log rotation. This is accomplished using the Computer Management function of Windows 2000. Windows NT, Windows XP and Windows 2003 Server utilize a similar procedure. To open Computer Management and establish log rotation, perform the following actions: • Click Start −> Settings −> Control Panel • Double−click Administrative Tools • Double−click Computer Management. • Double−click on Internet Information Services • Right−click on Default Web Site and select Properties • In the pop−up window, select the Web Site tab • At the bottom of the window, click on the Properties tab • Click the Daily radio button under the New Log Time Period heading • Click the Use local time for file naming and rollover checkbox. This will ensure that IIS rotates the webserver logs on a daily basis just after midnight.

Chapter 3: Urchin Administration

63

Logging − Apache and IIS

Overview It is critical to set up your webserver logging in a format that allows Urchin to properly interpret the data and produce fully detailed reporting. This article explains the process for the most common webservers, Apache and Microsoft IIS. For maximum reporting depth, it is important to enable logging to include Referral and User Agent information. To enable unique visitor reporting when using the Urchin Tracking Module (UTM), it is additionally required to enable cookie logging. UTM−based tracking is the only way to get true unique visitor reporting. It's advisable, although not required, that you decide whether you want to use UTM prior to changing your webserver logging. If so, you should enable cookies in your logs now. It will not hurt if you enable cookies but do not install UTM on your website immediately. You may want to look over the section on Visitor Tracking to familiarize yourself with the UTM installation before proceeding. Configuration Apache By default, Apache generally logs in what's called common log format, and also provides an option to log in a more detailed format known as NCSA extended/combined log format. For optimal reporting, Urchin requires a variation of the NCSA extended/combined format. To configure Apache to use the appropriate format do the following: 1. Make a backup copy of your httpd.conf file. Then use a text editor to open your original httpd.conf. 2. Locate the section containing lines that begin with the word LogFormat 3. Insert a new LogFormat line using one of the forms shown below, depending on whether you will be using UTM or not. The LogFormat entry must be added to your configuration file as a single line without carriage returns or line breaks. Make sure you pay close attention to entering in all the characters correctly. For websites that will not use UTM LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User−Agent}i\"" urchin For UTM−enabled websites: LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User−Agent}i\" \"%{Cookie}i\"" urchin The word "urchin" at the end of the LogFormat line is a nickname that will be used elsewhere in your httpd.conf to apply this format to a log file. This string can be anything you choose. Using "urchin" will help identify that this entry was created to accommodate Urchin processing. Chapter 3: Urchin Administration

64

4. Examine the entry for which you wish to enable this new logging format. Deactivate any existing TransferLog or CustomLog entries within a group by inserting a # in front (e.g. TransferLog becomes #TransferLog). Then insert the following new CustomLog entry, replacing the string path_to_log with the appropriate path to your log location: CustomLog path_to_log/access.log urchin If you chose some identifier other than "urchin" as the nickname for your LogFormat entry earlier, use that nickname in place of "urchin" in the CustomLog entry. 5. Save the edits to your httpd.conf file. 6. IMPORTANT! Check the syntax of your new httpd.conf by running the command: apachectl configtest This should produce the response syntax ok. If not, doublecheck your httpd.conf file and fix any errors. If you cannot get the correct response, do not continue with this procedure. Instead, make a backup copy of your edited file, then restore the original by overwriting this version with a copy of httpd.conf you saved at the start of this procedure. This will ensure that your webserver continues to work normally while you figure out what is wrong with your changes. 7. Once you have confirmed the syntax of your httpd.conf, restart Apache. The preferred method is by calling the apachectl script, which is typically installed with Apache. apachectl restart 8. Check the logging. Open a browser and hit the site in question a few times. Then examine the last few lines of the log file specified in your CustomLog entry. You should see several recent hits have been written to the log. For the Urchin modified extended/combined log format, a log line will look similar to this: 64.40.51.27 www.urchin.com − [28/Aug/2002:15:11:01 −0700] "GET //var/www/urchin_help−test/images/urchin_header_logo.gif HTTP/1.1" 200 3017 "http://www.urchin.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" If you have configured UTM on your site and have turned on cookie logging a log line will look similar to this: 64.40.51.27 www.urchin.com − [28/Aug/2002:15:11:01 − 0700] "GET //var/www/urchin_help−test/images/urchin_header_logo.gif HTTP/1.1" 200 3017 "http://www.urchin.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" "__utma=171060324.1378004559.1063331913.1063334677.1063521838.3; __utmb=171060324; __utmc=171060324" Note the additional UTM cookie information at the end of the line. Microsoft Internet Information Server (IIS) Note: Microsoft IIS uses a W3C logging format. Chapter 3: Urchin Administration

65

Urchin can provide very basic reporting if your IIS log files have, at the very least, the following fields: • Date • Time • C−IP • CS−URI−Stem • SC−Status • SC−Bytes These are required fields. Without them you will not get meaningful reporting. However, this minimal logging does not provide enough information for Referral and Browser reporting. Therefore it is advisable to set more detailed logging properties for your IIS server. IIS logging properties are configured either separately for each domain on the server, or globally. For servers with more than a few domains, the global option is recommended. The following steps will ensure that the required log file fields are being recorded. If you elect to log additional fields, Urchin will just ignore them at processing time. However, logging unneeded fields will increase the size of your log files so it is best to only log the fields needed by Urchin. 1. Launch the IIS services management tool by going to Start−>Programs−>Administrative Tools−>Computer Management 2. Expand the Services and Applications tree, then select Internet Information Services, which should bring up a list of websites (except on Windows 2003 Server which will require that you further expand the Web Sites folder to get a listing of sites). 3. Right click on the entry for the site you want to modify and select Properties 4. Select the Web Site tab and in the section at the bottom of this screen verify that the Enable Logging checkbox is checked. Then from the Active Log Format dropdown menu choose W3C Extended Log File Format. 5. Click on the Properties button next to the Active Log Format box 6. Select the Extended Properties tab 7. Check the boxes for the following fields: Date [ date] Time [ time ] Client IP Address [ c−ip ] User Name [ cs−username ] Method [ cs−method ] URI Stem [ cs−uri−stem ] URI Query [ cs−uri−query ] Protocol Status [ sc−status ] Bytes Sent [ sc−bytes ] User Agent [ cs[User−Agent] ] Referer [ cs[Referer] ] Cookie [ cs[Cookie] ] (This field only required for UTM tracking) 8. You should make sure the Process Accounting box is unchecked as it does not provide useful web access activity information. 9. Select Apply and OK on each window to save your settings. 10. It is not necessary to restart IIS. Your logs should immediately begin logging according to the new settings. Chapter 3: Urchin Administration

66

Logging − iPlanet

Overview This article provides a brief overview of how to configure logging for an iPlanet webserver to facilitate proper processing and reporting for Urchin. Use "Netscape" type for Log Source setting. There is a set of minimally required fields necessary for Urchin to produce reports. They are: • date • time • hostname or ip address of requesting system • request (i.e. what document did the requesting system ask from your webserver) • status code generated by request (numeric) • bytes (bytes transferred from server to client) In addition, for the most complete reporting you need the following fields: • referral • user−agent • cookies (if the Urchin Traffic Monitor is installed on your site) Configuration Init fn=flex−init access="$accesslog" format.access="%Ses−>client.ip% − %Req−>vars.auth−user% [%SYSDATE%] clf−request%\ %Req−>srvhdrs.clf−status% %Req−>srvhdrs.content−length% s.user−agent%\ s.referer%\ s.cookie%\

Logging: Tomcat (Apache Jakarta Project)

Chapter 3: Urchin Administration

67

Overview This article describes how to configure the Tomcat webserver for use with Urchin. Standard logging format without cookies. className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="access_log" suffix=".log" pattern="%h %v %u %t "%r" %s %b "%{Referer}i" "%{User−Agent}i"" resolveHosts="false"/> You must have Tomcat 5 to log cookies. className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="access." suffix=".log" pattern="%h %v %u %t %r %s %b %{Referer}i %{User−Agent}i %{Cookie}i" resolveHosts="false" />

Logging − Other Webservers

Overview This article provides a brief overview of how to configure logging for webservers other than Apache and IIS to facilitate proper processing and reporting for Urchin. Urchin will process any webserver log as long as it can understand how the data is organized in each log file entry. The information in this article applies only to access logs. If you are interested in details of e−commerce logging, please see the E−commerce Module section of the Documentation Center. Regardless of your webserver type or logging format there is a set of minimally required fields necessary for Urchin to produce reports. They are: • date • time • hostname or ip address of requesting system • request (i.e. what document did the requesting system ask from your webserver) • status code generated by request (numeric) • bytes (bytes transferred from server to client) In addition, for the most complete reporting you need the following fields: • referral • user−agent • cookies (if the Urchin Traffic Monitor is installed on your site) Configuration The specifics of how to make changes to logging characteristics for every webserver would be too cumbersome to list. In general the easiest approach is to configure your logging to conform to either Urchin's NCSA or W3C form, then choose the appropriate default format from the Log Format dropdown menu in the Log Source. If your webserver can support this approach then see the document Logging − Apache and IIS. Chapter 3: Urchin Administration

68

The information there on how the necessary data fields are setup may be useful to you, even though the details on the methods for making the changes won't necessarily apply to your webserver.

Wildcard &Date Substitution in Log Path

Overview Urchin 5 allows you to specify wildcard and date matching variables in the path to a log file. When an Urchin task is executed and the log path is read, these variables are converted and compared for matches with the directories and filenames on your system. The date matching capabilities in Urchin 5 are more extensive than those provided in previous versions of Urchin, namely: • Date substitution may happen at any point in the pathname of the logfile; previous versions of Urchin only allowed substitutions in the actual filename specification • A more robust and flexible data pattern matching algorithm has been implemented, although the previous YYYYMMDD−style pattern matching is still supported for backward compatibility The most commonly used time matching variables and formats are shown next. The full set of all supported time formatting variables is listed at the end of this article. * an asterisk matches zero or more consecutive characters DD is replaced by the 2 digit numeric day of the month, e.g. 01−31 %d is equivalent to DD MM is replaced by the 2 digit numeric month, e.g. 01−12 %m is equivalent to MM YY is replaced by the 2 digit numeric year, e.g. 01−99 YYYY is replaced by the 4 digit numeric year, e.g. 0001−2003 %Y is equivalent to YYYY Note that the asterisk in this context behaves like filename matching as you'd have in a command shell in UNIX or DOS, not like regular expression matching where this character would match zero or more instances of the preceeding character. These variables can be combined in any way the user chooses. The list below shows examples of how instances of these variables would translate on 08/13/2003. Note that the day specifiers DD and %d get converted into the day before 13. • YYYYMMDD would translate into 20030812 • %Y−%m−%d would translate into 2003−08−12 • %Y/%m/%d would translate into 2003/08/12 (note that this has implications in a path) • *YYYYMMDD would match any filename ending in the string 20030812 The DD and %d day specifiers get converted into the previous day by default because of the way webserver logs and Urchin processing are typically managed. Your logs will usually be rotated daily to keep them from growing too large and so that each log contains primarily data for a single day. This rotation happens most frequently just before midnight. Urchin processing would usually occur after this when the clock has moved Chapter 3: Urchin Administration

69

past midnight to the next day of the month. If you were adding a YYYYMMDD style timestamp to your log file name as it is rotated, then that date and Urchin's run time would differ by one day. Evaluating a day conversion at the time Urchin is run would result in a failure to find the correct log name since the log timestamp would read 20030812, but Urchin would be executed on 20030813. Although this is the most common model for log management, it isn't the only option. So Urchin has a configuration parameter that controls the manner in which these variables get resolved to a particular day. The Date/Time Wildcard Substitution in Log Path Name setting can be used to adjust a time offset that controls how DD and %d are evaluated. This setting is explained in greater detail at the end of this article. The year, month, and day variables can be used either in the log file name or in the directory/folder path to the log file. The asterisk can only be used in the filename portion of the log file path. As well, time format variables can be repeated within a log source path, but the asterisk may only be used once. The examples in the Procedure section will help clarify this. Procedure When creating or editing a Log Source, you should use the time variables in the path you use in the Log File Path box under the Log Settings tab. As an example, a typical daily Apache webserver log rotation scheme creates a log with the datestamp indicating the date of the log entries, e.g. at 1 minute after midnight on 07/16/2002 the log rotation mechanism archives the log: /var/log/httpd/access.log

and saves it as /var/log/httpd/access.log.20020715

To match this pattern in the log source for an Urchin Profile, you'd simply specify /var/log/httpd/access.log.YYYYMMDD

in the Log File Path and Urchin will automatically look for the previous day's log when it runs that day. As another example, when Microsoft's IIS webserver is configured to rotate logs daily, it will name the logfile and include the current date as part of the filename, e.g. ex021127.log. Therefore, to process a daily IIS log, you would use a logfile specification something like: C:\WINNT\System32\LogFiles\W3SVC1\exYYMMDD.log

in the Log File Path field of the Log Source for the Profile. To allow Urchin to process logs that are rotated more frequently than just a daily basis, you can use a combination of the YYYYMMDD syntax and wildcards to match all logfiles created the previous day. To do this, you would need to ensure that the rotated log file was named consistently, e.g. with an hour appended to the filename. In the Log File Path specification, you'd then use a pattern such as: /var/log/httpd/access.log.YYYYMMDD*

or Chapter 3: Urchin Administration

70

C:\WINNT\System32\LogFiles\W3SVC1\exYYMMDD*.log

A more complex usage would be one where logs are stored in directories named so that they reflect the year, month, and day. Suppose you had the following directory paths for storing logs: /logs/2003/07 /logs/2003/08 /logs/2003/09 and you kept all logs for a given month in their respective directories and each log had the day of the month appended to it (e.g. access.log.01, access.log.02). To allow Urchin to figure out what logs to process you could use one of the following log path formats: /logs/YYYY/MM/access.log.DD /logs/%Y/%M/access.log.%d At log processing times, Urchin will then process all logs matching yesterday's date pattern, with any suffix. As with any use of wildcards in the Log File Path field specification, it is important that Log Tracking for the Profile be enabled to ensure that Urchin does not re−process logs. Considerations To determine the date for the replacement pattern, Urchin subtracts 24 hours from the current time, based on the local time. It will properly handle month and year boundaries. However, this can be modified using the Date/Time Wildcard Substitution in Log Path Name setting under the Advanced Settings tab of a log source. You can select either Localtime or GMT time as the basis for your time adjustments, then using the Hours edit box specify a plus or minus offset in hours. Complete Date and Time Format Reference This is the full list of supported time format variables, which follows conventions used in the Standard C Library strftime() routine: • %A = national representation of the full weekday name. • %a = national representation of the abbreviated weekday name. • %B = national representation of the full month name. • %b = national representation of the abbreviated month name. • %d = the day of the month as a decimal number (01−31). • %e = the day of month as a decimal number (1−31); single digits are preceded by a blank. • %H = the hour (24−hour clock) as a decimal number (00−23). • %I = the hour (12−hour clock) as a decimal number (01−12). • %j = the day of the year as a decimal number (001−366). • %k = the hour (24−hour clock) as a decimal number (0−23); single digits are preceded by a blank. • %l = the hour (12−hour clock) as a decimal number (1−12); single digits are preceded by a blank. • %M = the minute as a decimal number (00−59). • %m = the month as a decimal number (01−12). • %p = national representation of either "ante meridiem" or "post meridiem" as appropriate. • %S = the second as a decimal number (00−60). • %s = the number of seconds since the Epoch, UTC (see mktime(3)). Chapter 3: Urchin Administration

71

• %w = the weekday (Sunday as the first day of the week) as a decimal number (0−6). • %Y = the year with century as a decimal number. • %y = the year without century as a decimal number (00−99). • %z = the time zone offset from UTC; a leading plus sign stands for east of UTC, a minus sign for west of UTC, hours and minutes follow with two digits each and no delimiter between them (common form for RFC 822 date headers). • %% = `%'. (for use when a literal percent sign is needed inside a date/time entry)

Processing Historical Logs

Overview You may wish to process your historical logs after installing Urchin. This is easily accomplished. Simply specify a directory and a partial filename and/or wildcard (including regular expressions) in the Log Manager's Log Settings screens. NOTE: You may not use wildcards on remote HTTP and HTTPS log sources. How to Process Historical Logs First, add a Log Source to the system. Click on the Configuration button at left, and then the Log Manager button. On the main screen, click on the Add button at top−right. On the first screen, select Add Local Log Source, and continue. On the next screen, click Browse, which will bring up the File Browser. Locate the correct directory in the left−side window. The right−side window will display the files in the directory, and the left side will display any other directories. When you are in the correct directory, enter a partial filename and an asterisk (or other regular expression), and click the Verify button. A window will open which will show you all the matches to your pattern. Click any of the filenames to get information on the file −− location, size, modification date, and file permissions. If the pattern match is correct, click OK, and then OK again in the File Browser window. Next, if it hasn't been already, associate this log file with a Profile by clicking the Configuration button at left, and then the Profiles button. Once the association has been completed (see Working with Profiles), click the Run/Schedule button next to the Profile in the main Profiles listing, and schedule the execution of the Profile, or click the Run Now button for immediate processing. Urchin does not need any sort of log rotation to avoid data duplication. Urchin is equipped with a log tracking capability that ensures only new hits are processed. However, as mentioned above, logs can quickly consume large volumes of disk space, so it is a good idea to periodically compress and archive log files. Because Urchin never needs to re−read log files once they have been processed, it is perfectly acceptable to delete the log(s) after each processing run. However, many people keep logs for a specified amount of time in case they are needed for some reason, such as if a new Profile is created for that site, and historical analysis is desired. Recommendations

Chapter 3: Urchin Administration

72

Log management is not essential from the outset, but as logs grow, it becomes important. We recommend deciding on a log management plan when you initially deploy Urchin.

Log Reprocessing

Overview Certain circumstances may warrant re−processing of log data, such as a DNS server being down when the processing was, incorrectly applied filters, and so on. The following document describes the proper procedure necessary to back out and reprocess webserver log data. Please note that reprocessing logs requires the use of Urchin utilities that are only available from a command line shell environment. It is not possible to do the complete procedure exclusively from the Urchin web−based administrative GUI. Reprocessing a Single Day: • In the Urchin admin GUI, edit the Profile and turn off Log Tracking under the Storage/DB tab. Be sure to click Update to save your change. • Under the Log Sources tab, ensure that the proper log file (s) to be re−processed are specified. The log data should only contain hits for the date(s) that you are zeroing out the statistics for. • Invoke a command shell on the Urchin system • Run the udb−sanitizer utility in the 'util' directory/folder of the Urchin distribution with the command udb−sanitizer −p profile−name −d YYYYMM where YYYYMM is the year and month containing the day you wish to reprocess • Select option 5, Zero out one or more days. The utility will prompt you for the correct day and will zero out the statistics for that particular day. If you have a range of contiguous days you'd like to zero you you can specify that range by using the numbers of the start and end days separated by a hyphen (e.g. 5−10 to zero out days 5 through 10 of the month). If necessary, re− invoke the utility to zero out statistics for additional days in that month if you cannot use a range. • Click the Run Now button under the Run/Schedule tab for the Profile to reprocess the log data • Reset the Log Source by changing the Log File Path back to its original setting • Under the Storage/DB tab in the profile edit area, turn Log Tracking back on Reprocessing an Entire Month: The procedure for reprocessing an entire month's worth of data is identical to the single day procedure above, except when invoking the udb−sanitizer utility select Option 2, Delete this month entirely instead of Option 5. Additional information:

Chapter 3: Urchin Administration

73

The udb−sanitizer utility provides additional functionality for managing Urchin databases. Please see the udb−sanitizer article in the Advanced Topics−>Utilities section for further information about its capabilities and usage.

Filtering Filtering Overview

This article describes data processing filters, which are applied before reports are generated. In addition to data processing filters, Urchin provides report filtering on the reporting interface. Read Reporting Interface −−> Report Side Filtering for information. To create a filter, click the Add button in the Filter Manager screen. Filtering Sequence Each time the scheduler runs a profile, each entry in the log files passes through the steps shown in the figure below. Before any of the report tables are updated, the 'raw' fields in the log file entry are parsed, which creates a number of 'auto' calculated fields. For example, the browser and platform fields are calculated from the raw cs_useragent field.

Filtering is applied once all of the fields have been populated, and before any entries are made in the report tables. Filters can be applied to any type of field, including calculated fields. No additional parsing occurs after filters are applied. Thus, it is important to apply the Filter to the correct field. A list of the purpose of each available field is provided in the next section. Filters are applied in the following order: 1. Advanced Filters, Search &Replace Filters, and DynamicURL Filters 2. Decode URL and Japanese Encoding Filters 3. Lookup Tables 4. Include and Exclude Filters For example, if an Exclude Filter is applied to the same field as the Decode URL Filter, the Exclude Filter must take into account that encoded characters, such as %20, will have already been translated.

Chapter 3: Urchin Administration

74

Filter Types • Exclude Pattern: This type of filter excludes log file lines (hits) that match the Filter Pattern. Matching lines are ignored in their entirety; for example, a filter that excludes Netscape will also exclude all other information in that log line, such as visitor, path, referral, and domain information. • Include Pattern: This type of filter includes log file lines (hits) that match the Filter Pattern. All non−matching hits will be ignored and any data in non−matching hits is unavailable to the Urchin reports. • Decode URL: This is a predefined filter that decodes URL−encoded characters back to their original form. For example, '%20' in a URL is replaced with a space. Apply this filter to URI−stems and queries to see the original text. • Japanese Encode (UTF−8): This is a predefined filter, generally applied to the keywords field or other potentially multi−encoded field, that looks for Japanese encoded words and converts the encoding to UTF−8 format for a consistent storage and display. • Search &Replace: This is a simple filter that can be used to search for a pattern within a field and replace the found pattern with an alternate form. See the section on Search &Replace Filters for more information. • Dynamic URL (deprecated): This type of filter is used to translate arcane dynamically generated URLs into more human−readable page names. Note: the new Page Query Terms Report duplicates the bulk of this function, and the Advanced Filter encompasses all DynamicURL's features and more. It is strongly recommended that you either eliminate old Dynamic URL filters if possible or else convert them to one of the newer forms of filter. • Advanced: This type of filter allows you to build a field from one or two other fields. The filtering engine will apply the expressions in the two Extract fields to the specified fields and then construct a field using the Constructor expression. Read the Advanced Filters article for more information. Choosing Where To Apply a Filter Filters can be applied either to profiles or to individual log sources. The scope of the filter can be different for each of these cases. A filter applied to a profile will affect all log sources processed for that profile. A filter applied to a log source will always affect that specific log source, even if multiple profiles are using the same log source. In general, you should apply filters to the profile unless one of the following cases occurs: • You have multiple log sources for a profile and you do not want the filter to apply to all of the log sources. • You have multiple profiles using the same log source, and you want all of the profiles to use the same filter. In these two cases, apply the filter to the specific log Source, otherwise, it is recommended to apply the filter to the profile. Creating and Managing Filters In the Urchin administration interface, click Configuration, then Urchin Profile−−>Filter Manager. Click the Add button to launch the Filter Wizard. Once you have created a filter, edit the profiles or log sources to which you wish to apply the filter, and add the filter. Chapter 3: Urchin Administration

75

To create a filter while editing a profile or log source, click the Profile Filters tab or Log Filters tab. A window appears showing the currently active filters. Click the Add button on this window to launch the Filter Wizard. The filter creation screen has a dropdown menu at the top with selectable built−in filters for common filtering tasks such as filtering out robot traffic to your site. These built−in filters also serve as examples of how to set up various kinds of filters.

Filter Fields

Overview When a hit or line in a log file is read during processing, the hit is broken down into 'Raw Fields'. Fields are generally separated by spaces, tabs, or commas. The Log Format as chosen in the Log Source−>Log Settings screen determines how these Raw Fields are assigned internally. Once the Raw Fields are read, Urchin automatically calculates the 'Auto Fields', using the values in the 'Raw Fields'. Most reports use data in these Auto Fields for updating. Filters can be applied to either Raw or Auto Fields. The following two tables provides insight into the purpose of each Field. The first table lists the Fields used for standard reports. A dash in the Fields Used column means that the report in question summarizes numbers generated in other reports and therefore is not tied specifically to the data in particular fields. The second table lists all available Fields and their purpose. Report Field List Report Name Traffic Sessions Graph Pageviews Graph Hits Graph Bytes Graph Summary Visitors &Sessions Visitors by Day Sessions by Day Unique Visitors Unique Sessions Visitor Loyalty Session Frequency Summary Pages &Files Chapter 3: Urchin Administration

Fields Used − − − − − − − − − utm_session_number − −

76

Requested Pages Downloads All Files Directory by Pages Drilldown Directory by Files Drilldown Directory by Bytes Drilldown File Types by Hits File Types by Bytes Page Query Terms Posted Forms Status and Errors Navigation Entrance Pages Exit Pages Click Paths Click To and From Length of Pageview Depth of Session Length of Session Click To and From Report Referrals Referrals Referral Drilldown Search Terms Search Engines Referral Errors Domains &Users Domains Domain Drilldown Countries IP Addresses IP Drilldown Usernames by Hits Usernames by Bytes Usernames by Sessions Browsers &Robots Browsers by Sessions Drilldown Browsers by Hits Drilldown Browsers by Bytes Drilldown

Chapter 3: Urchin Administration

request_stem request_stem request_origfilepath request_stem request_origfilepath request_stem request_origmime request_origmime request_stem|request_query request_stem sc_status|request_errordetail request_stem request_stem request_stem request_stem request_stem − − request_stem referral_domainandstem referral_domainandstem referral_domain|referral_keywords referral_domain|referral_keywords referral_errordetail|referral_domainandstem domain_primary|domain_complete domain_primary|domain_complete domain_primary|domain_complete c_ip c_ip cs_username cs_username cs_username useragent_complete useragent_complete useragent_complete

77

Platforms by Sessions Drilldown Platforms by Hits Drilldown Platforms by Bytes Drilldown Combos by Sessions Robots by Hits Drilldown Robots by Bytes Drilldown Client Parameters Screen Resolution Screen Colors Languages Java Enabled Timezone Offset Javascript Version E−Commerce Revenue Number of Transactions Products by Revenue Products by Quantity Products by Revenue Drilldown Products by Quantity Drilldown E−Commerce Summary Revenue Source Revenue by Region Drilldown Revenue by City Revenue by Referrals Revenue by Search Terms Revenue by Search Engines Drilldown Revenue by Domains Drilldown

useragent_complete useragent_complete useragent_complete useragent_complete browser_base browser_base utm_screen_resolution utm_screen_colors utm_language utm_java_enabled utm_timezone_offset utm_js_version − − elf_productname|elf_productcode elf_productname|elf_productcode elf_productname|elf_productcode elf_productname|elf_productcode − elf_region elf_region referral_domainandstem referral_domain|referral_keywords referral_domain|referral_keywords domain_primary|domain_complete

Complete Field List id 1 2 3 4 5 6 7 8

Field iis_date iis_time apache_time c_ip cs_username selected>cs_request cs_method cs_uristem

Type (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW)

Chapter 3: Urchin Administration

Purpose IIS raw date of hit field. IIS raw time of hit field. Apache raw date &time of hit field. Client IP Address. Client username (if any) Apache raw entire request field. IIS raw request method field. IIS raw request stem field. 78

9 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 76 77 78 79 80 81 82 83

cs_uriquery sc_status sc_bytes c_host cs_useragent cs_cookie cs_referer custom_date custom_time cs_host s_port cs_version s_sitename s_computername s_ip elf_orderid elf_store elf_sessionid elf_total elf_tax elf_shipping elf_billcity elf_billstate elf_billzip elf_billcountry elf_productcode elf_productname elf_variation elf_price elf_quantity elf_upsold referral_protocol referral_host referral_domain referral_port referral_url referral_uri referral_stem referral_query

(RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)

Chapter 3: Urchin Administration

IIS raw request query field. Return status code from server. Number of bytes transferred for request. Client hostname (converts to c_ip if necessary). Browser user−agent information. Cookies sent by browser. Raw Referral information (could be internal). Used for datestamp in Custom Logs. Used for timestamp in Custom Logs. Requested virtualhost by Client. Server port number. IIS Raw HTTP version. IIS Server site name. IIS Computer name. IIS Server IP address. E−commerce order id number. E−commerce store name. E−commerce session id. E−commerce transaction amount. E−commerce tax amount. E−commerce shipping amount. E−commerce customer city. E−commerce customer state. E−commerce customer zip code. E−commerce customer country. E−commerce product code. E−commerce product name. E−commerce product variation. E−commerce product price. E−commerce product quantity. E−commerce upsold variable. Referral protocol (http/https/etc.) Referral complete hostname. Referral domain name. Referral port number (if any). Referral complete URL. (includes host) Referral complete URI. (no host) Referral URI stem without query info. Referral Query info by itself.

79

84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 119 120 121 122 124

referral_anchor referral_directory referral_filename referral_mime referral_keywords referral_domainandstem referral_errordetail request_method request_url request_version request_protocol request_host request_port request_uri request_stem request_query request_anchor request_directory request_filename request_mime request_origfilepath request_origmime request_errordetail useragent_complete browser_base browser_version platform_base platform_version domain_primary domain_complete sid utm_cookiea utm_cookieb utm_cookiec utm_cookie1 utm_cookie2 utm_cookie3 utm_unique_id utm_page

(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)

Chapter 3: Urchin Administration

Referral information after # tag. Referral directory up to filename. Referral filename without directory. Referral mime type (file extension) Referral search engine keywords Referral domain and URI stem together. Referral error detail information. Request method (GET/POST/etc.). Request complete URL (if provided). Request protocol version. Request protocol (HTTP/etc.). Request hostname (if any). Request port number (if any). Request URI with query. Request URI without query. Request query information (e.g., after ?) Request information after # tag Request directory without filename. Request filename without directory. Request mime type (file extension). Request original uri stem if UTM. Request original mime type if UTM. Request detail for error hits. Complete user− agent. Browser name (e.g., Netscape). Browser version. Platform (e.g., Windows). Platform version. First level domain. (e.g. com). Complete domain. (e.g. urchin.com). Session id (if any). UTM−2 cookie−a UTM−2 cookie−b UTM−2 cookie−c UTM−1 cookie−1 UTM−2 cookie−2 UTM−3 cookie−3 UTM unique visitor id. UTM page variable (used for request_ variables).

80

125 126 127 128 129 130 131 132 133 134 135 145

utm_referral utm_screen_resolution utm_screen_available utm_browser_size utm_screen_colors utm_language utm_java_enabled utm_cookies_enabled utm_timezone_offset utm_js_version utm_session_number elf_region

(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)

UTM Referral (used for referral_ variables). Screen resolution (e.g., 800x600). Available screen resolution in pixels. Browser size in pixels. Screen color bit depth. Browser language code setting. yes|no if java is enabled. yes|no if cookies are enabled. +/−HHMM timezone offset value of browser. Javascript version info. Number of sessions for this visitor. E−Commerce region drilldown information.

Exclude/Include Filters

Introduction Exclude and Include Filters, set up in the admin interface and applied to a log source or profile, are used to eliminate unwanted hits when processing a log file. The filters use POSIX regular expressions when matching against data in the fields of a hit. If you are unfamiliar with regular expressions, please read the Regular Expression Overview document in this section before proceeding. How Urchin Uses Exclude/Include Filters These filters are applied after the Decode URL, Japanese Encode, Dynamic URL, Search & Replace and Advanced filters. Urchin applies the Exclude/Include filters in succession. If the filter being applied is an Exclude Filter and the pattern matches, the hit is thrown away and Urchin continues with the next hit. If the pattern does not match, Urchin applies the next filter to the hit. This means that you can create either a single Exclude Filter with multiple patterns separated by '|' or you can create multiple Exclude Filters with a single pattern each. Include Filters are applied with the reverse logic. When an Include Filter is applied, the hit is thrown away if the pattern does not match the data. If multiple Include Filters are applied, the hit must match every applied Include Filter in order for the hit to be saved. To include multiple patterns for a specific field, create a single include filter that contains all of the individual expressions separated by '|'. Using Exclude/Include Filters

Chapter 3: Urchin Administration

81

In the figure above, the exclude filter requires a filter expression and a filter field. During processing, the filter expression is compared with data in the filter field and the hit is thrown away if the filter matches. See the Filter Fields article for a complete list of fields that are available. The above example illustrates how to filter out image hits by filtering out all mime types that match gif, jpg, png, jpeg, and ico. This list can be customized to match any mime type.

In the figure above, the include filter requires a filter expression and a filter field. During processing, the filter expression is compared with data in the filter field and the hit is thrown away if the filter does not match. See the Filter Fields article for a complete list of fields that are available. This example shows how to filter in only html pages by requiring the mime type of the request to be html. Controls The 'Case Sensitive' control allows you to specify whether the filter should be applied with or without case sensitivity.

Decode URL Filters

Introduction The Decode URL filter is used to convert data from a URL encoded form to a more readable form. Encodings such as %20, for example, are converted into spaces. Chapter 3: Urchin Administration

82

Using Decode URL Filters

To use the Decode URL filter, select a Filter Field. During processing, the data in the Filter Field is decoded and stored back in that field. The field can then be displayed in the reports. Refer to the Filter Fields article for a complete field list. Although the above example illustrates how to use a Decode URL filter to decode referral keywords, the filter is also useful for decoding request stem and request query.

Search &Replace

Introduction Use the Search &Replace Filters to replace a matched expression with another string. This type of filter is a simplified version of an advanced filter. Using Search &Replace Filters

The search &replace filter requires a filter field, an expression to search for, and a replace expression. The search expression is a POSIX regular expression. The replace expression is any text that you wish to have replace the matched part. Refer to the Filter Fields article for a complete field list. This above example above illustrates how to use a search &replace URL filter to remove a leading directory from the path of a page. Another use for this type of filter would be to replace category id numbers with descriptive words in the query string of a request. For example, suppose that samples of the requested file with attached queries looks as Chapter 3: Urchin Administration

83

follows: /docs/document.cgi?id=1000 /docs/document.cgi?id=2000 Using the search and replace filter, you could convert the 1000 or 2000 ids to their equivalents. For example, 1000 could be changed to books and 2000 to magazines. This would make the viewing of the pages report more useful for people who are not familiar with the codes used to identify the individual items.

Lookup Table Filters

The Lookup Table filter is available beginning with Urchin 5.6. The Lookup Table filter can be used to: • implement master tracking codes for campaign tracking. Read Campaign Tracking Module−−>How To Use Master Tracking Codes. • implement an external data table to lookup and replace character field values when a match occurs. Lookup tables can match against a single field and update multiple fields. Read Advanced Topics−−>Customization−−>Custom Lookup Tables. • map Japanese phone manufacturer/model abbreviations to full names. See below. To apply the Japanese phone filter: 1. In the Filter Wizard:Settings screen, enter your desired filter name (Filter Name field). 2. Select Lookup Table as shown in the screen image below. 3. Select platform_version (AUTO) from the Filter Field drop down menu, as shown in the screen image below. 4. Select phone models from the Table Name drop down menu, as shown below.

Chapter 3: Urchin Administration

84

Advanced Filters

Introduction The Advanced Filter option allows you to construct Fields for reporting from one or two existing Fields. POSIX regular expressions and corresponding variables can be used to capture all or parts of Fields and combine the result in any order you wish. For general information on how filtering works and a list of what each Field is used for, see the Filtering Overview and Filter Fields articles at the beginning of this section. Using Advanced Filters

Shown in the figure above, the Advanced Filter takes up to two fields: Field A and Field B, and constructs the Output Field. The construction occurs in the following manner. The Extract A expression is applied to Field A, and the Extract B expression is applied to Field B. These expression can use complete or partial text matches and include wildcards. The following is a list of the most common wildcards and their meanings. The expressions conform to POSIX regular expressions. Wildcard . * + ? () [] − | ^ $ \

Meaning match any single character match zero or more of the previous item match one or more of the previous item match zero or one of the previous item remember contents of parenthesis as item match one item in this list create a range in a list or match to the beginning of the field match to the end of the field escape any of the above

Use the parenthesis () to capture parts of the Fields. These can be referenced in the Constructor using the $A1, $A2, $B1, $B2 notation. The A|B refers to the Field, and the number refers to which Chapter 3: Urchin Administration

85

parenthesis to grab. In the above example, the entire A Field and the entire B Field are captured and assembled as the new field. The Output Field can be a separate field or the same field as Field A or Field B. Controls The 'Override Output Field' control allows you to decide what to do if the Output Field already exists. The 'Required Field' allows you to decide what to do if one of the expressions does not match.

DynamicURL Filters (deprecated)

Note Urchin 5 and later displays query terms in a seperate report by default. Unlike versions 4 and earlier, it is not necessary to create a filter to display query strings. However, unlike versions 4 and earlier, the data is not displayed in the 'Pages' report. Urchin 5 displays this data in a drill−down report titled 'Page Query Terms.' This report can be found under the 'Pages &Files' report menu. Many sites today will use a CGI, ASP or other scripting mechanism to provide dynamic content. Often, a single script is used to deliver multiple pages of information. While this can be a handy way to track users sessions or provide ?live? content, it poses an additional challenge for meaningful reporting. By default, Urchin strips all the parameters associated with a page request (e.g. those that would typically be used with a CGI or ASP) and stores only the pathname of the page requested in its database. The DynamicURL filtering feature allows you to use regular expressions to selectively capture these parameters and present them in an intuitive way. As an example, a CGI script might be used to deliver information about all products in a catalog. The script draws from a database, and uses parameters passed through the request to determine which product to display. The resulting hit in the webserver log for this request might look like: /cgi−bin/showProduct.cgi?sessionId=123456789 |______________________| |_________________________________|

Under normal operation, Urchin will record that the showProduct.cgi page was requested, and all parameters up to and including the "?" will be stripped. By using a DynamicURL filter, Urchin can store some or all of the parameters and produce a unique page record based on the parameter list. Now in this example, we don?t necessarily want to capture the entire second part of the request because of the ?sessionId.? Let?s assume that this parameter changes for each visit and we get 30,000 visits per day. Including this piece of information would create far too many unique pages and render the Pages reporting useless. Instead we just want to capture the ?productId? and report only on that information. /cgi−bin/showProduct.cgi?sessionId=123456789

Chapter 3: Urchin Administration

86

We may still want to know which script was used as well as which product was implicated in the request. By using a DynamicURL filter, we can capture multiple parts of the request and recombine them into a new, formatted request ready for reporting. Here is an example of a filter that could be used with the page request above: (/cgi−bin/showProduct.cgi\?).*productId=(.*)

This regular expression will match the above request no matter what the value of the sessionId or productId was. And the parenthesis capture the parts of the request that we want to keep for reporting. The effective request of the above example would look like: /cgi−bin/showProduct.cgi/knobs

Up to 5 sets of parenthesis can be used. And, multiple filters can be applied. If a request does not match the DynamicURL filter, it is left unmodified, but still included in the reporting. This allows you to use multiple DynamicURL filters for each area of a site. Keep in mind there is a slight performance hit for each filter used. Note that DynamicURL filters can only be applied to the base URL and query string that form the page request. They cannot be used to filter referrals or any other fields in the log file. Also, when DynamicURLs and FilterIn/FilterOut are used together the DynamicURL will be applied after the other filters. So consideration must be given to how one set of filters affects the others when choosing what to filter. Examples Example 1: We want to capture the all the specific Knowledgebase article IDs in the Urchin 4 report for help.urchin.com. Here's a sample of what the Request portion of the hit looks like in the log file: GET /knowledge.cgi?cmd=2 The proper Dynamic URL filter to extract the article ID is: (/knowledge\.cgi\?)cmd=2 and this produces Top Pages reports that look like: 1. 2. 3. 4. 5.

/knowledge.cgi /knowledge.cgi/id=767 /knowledge.cgi/id=807 /knowledge.cgi/id=768 /knowledge.cgi/id=777

1,081 244 136 50 40

46.43% 10.48% 5.84% 2.15% 1.72%

Example 2: We want to capture the all the search keywords used in the Urchin 4 report for help.urchin.com. Here's a sample of what the Request portion of the hit looks like in the log file: GET /knowledge.cgi?cmd=1PE=0= utm The proper Dynamic URL filter to extract the keyword information is: (/knowledge.cgi\?).*s_(keyword=[^ and this produces Top Pages reports that look like: 1. /knowledge.cgi 1,373 2. /knowledge.cgi/keyword=utm 29 3. /knowledge.cgi/keyword=default+page 18

Chapter 3: Urchin Administration

68.65% 1.45% 0.90%

87

4. /knowledge.cgi/keyword=no+referral 5. /knowledge.cgi/keyword=scheduler

11 10

0.55% 0.50%

Regular Expression Overview

Introduction Posix regular expressions are used to match or capture portions of a field using wildcards and metacharacters. They are often used for text manipulation tasks. Most of the filters included in Urchin use these expressions to match the data and perform an action when a match is achieved. For instance, an exclude filter is designed to exclude the hit if the regular expression in the filter matches the data contained in the field specified by the filter. Regular expressions are text strings that contain characters, numbers, and wildcards. A list of common wildcards is contained in the table below. Note that these wildcard characters can be used literally by escaping them with a backslash '\'. Wildcard . * + ? () [] − | ^ $ \

Meaning match any single character match zero or more of the previous item match one or more of the previous item match zero or one of the previous item remember contents of parenthesis as item match one item in this list create a range in a list or match to the beginning of the field match to the end of the field escape any of the above

Tips for Regular Expressions 1. Make the regular expression as simple as possible. Complex expressions take longer to process or match than simple expressions. 2. Avoid the use of .* if possible since this expression matches everything and may slow down processing the expression. For instance, if you need to match index.html, use index\.html, not .*index\.html.* 3. Try to group patterns together when possible. For instance, if you wish to match a file suffix or .gif, .jpg, and .png, use "\.(gif|jpg|png)" not "\.gif|\.jpg|\.png". 4. Be sure to escape the regular expression wildcards or metacharacters if you wish to match Chapter 3: Urchin Administration

88

those literal characters. 5. Use anchors whenever possible. The anchor characters are ^ and $, which match either the beginning or end of an expression. Using these when possible will speed up processing. For instance, to match foo directory in /foo/bar, use ^/foo/ instead of /foo/. Using the ^ will force the expression to match at the beginning and will improve processing speed.

Affiliations, Users &Groups Working with Affiliations

Overview An Affiliation is a high level association that is used to group together related Profiles, Log Sources, Users and Groups under a single identifying label, which is typically a corporate or client name. For Urchin installations where there is a need to support multiple complex client organizations, creating an Affiliation allows the Urchin administrator to keep track easily of all the Urchin reporting components for a particular client or corporate entity. Access rights to Urchin reports can be controlled via an Affiliation association, and within an Affiliation even more granular access rights to certain reports can be assigned to Groups or Users, thereby protecting your data as desired at multiple levels. As well, Affiliation level administration rights can be assigned to a user who can then act as a local Urchin administrator for the Affiliation. This allows distribution of the responsibility for managing, configuring, and maintaining the Urchin reports within an Affiliation. It is important to note that since Affiliations are the highest level organizational "element" in the Urchin administration interface, you should create the Affiliations before you create any Profiles, Log Sources, Users, or Groups that will be associated with the Affiliation. The choice of Affiliation must be made when an Urchin element is created, and it cannot be changed afterwards. At creation time if you do not choose a specific Affiliation the default Affiliation of (NONE) will be set. This tells Urchin that the element in question has no Affiliation. Creating Affiliations To create an Affiliation, go to the Configuration−>Users &Groups−>Affiliation screen and click the Add button. Only the Affiliation Name is required; Contact and Contact Email are not used by Urchin and are strictly informational fields for the benefit of the adminstrator. Report Data Location (optional) specifies where to store the report data for all the Profiles that belong to the Affiliation. The default location is the data/reports directory within the Urchin distribution. Changing the Report Data Location allows you to physically separate within your file system the report data for different organizations. Directory Browsing Location (optional) is provided as a security measure when giving an Affiliation administrator access to create Profiles. A directory entered into this field will limit the Affiliation admin's ability to browse for log files to only that directory. By default, there are no Chapter 3: Urchin Administration

89

restrictions on where an Affiliation admin can browse for log files. Using Affiliations To assign a Profile, Log Source, User, or Group to an Affiliation, use the dropdown menu labeled Optional Affiliation in the initial screen of the setup wizard when creating the given element. Once an Affiliation is assigned to a Profile, Log Source, User, etc., the Urchin admin interface will restrict modification choices to those elements that are associated with the Affiliation. In this way the Affiliation acts to control access rights at a high level so that you can isolate organizations from one another without the need to set specific access permissions on every report. Affiliations also aid in distributing management reponsibilities for the Urchin configuration. Within an Affiliation, the primary Urchin administrator can assign local admin privileges to Affiliation users. There are three admininstrative levels of control that can be assigned to a User. See the Working with Users &Groups article in this section for details. When viewing the admin screens for Urchin configuration parameters, you may filter the entries to selectively show only those with a particular Affiliation name by using the Affiliation dropdown menu in the top bar of the table. Although (NONE) means no affiliation, for purposes of filtering (NONE) is shown as an option in the Affilation dropdown so that you may view only entries that are not affiliated.

Working with Users &Groups

Overview Urchin's Users &Groups functionality allows Urchin administrators to easily set up any number of users and grant them access to whichever reports deemed appropriate. These users can then be put into groups to expedite and simplify management of large numbers of users. If a group of users is granted report access, all users in that group will have access upon logging in to the system. Users do not have report access ever unless specifically allowed. How to Use Users 1. Log−in to your Urchin system as an administrator. Note: the URL for accessing the Urchin system is identical regardless of the user type. 2. Click on Configuration in the main left−side navigation. 3. Click on Users &Groups. 4. Click the Add button at upper−right to enter the User Wizard. 5. Select a username −− it should be lower−case and must not have spaces −− and choose a password. 6. Enter the user's real name −− this will be displayed in the Urchin system. Click the Next button. 7. Determine what level of control the user should have − unless you are running Urchin in Datacenter mode, the only choices will be User and Super Admin. If you are running in Chapter 3: Urchin Administration

90

Datacenter mode, you also have the choice of Affiliate Admin (see the embedded help by clicking on the help link for specifics on affiliate admin settings). Click the Next button. 8. Once the user has been created, click on the Edit icon next to that user. 9. Click on the Report Access tab. The available Profiles will be shown in the box at left. 10. Select one or more Profiles. To select multiple Profiles, use the command key or control key depending on your platform. 11. Click the right−facing arrow to move the Profile(s) to the Access Granted box. 12. Click Update to save changes.

How to Use Groups 1. Log−in to your Urchin system as an administrator and click on Configuration at left 2. Click on Users &Groups. 3. Click on Groups. 4. Click the Add button at top−right. 5. Enter the Group Name −− this can be anything descriptive. 6. Enter the Group Description −− this might be something to do with the location or composition of the group. 7. Click Finish. 8. Click Done and then click Edit next to the group name. 9. Click the Users in Group tab to select users to add to the group −− to select multiple users, use the command key or control key depending on your platform. 10. Click the right−facing arrow to move users to the Users in Group box. 11. Click the Update button to save changes.

Chapter 3: Urchin Administration

91

12. To add users to the group, click the Users in Group tab and add users as described in the procedure above.

Recommendations ♦ Any "Super Admin" level user has complete control over the Urchin system, so it is advisable to only grant that privilege to one person. ♦ Passwords should contain one or more capital letters and/or symbols to make them difficult to guess.

Scheduling Tasks Working with the Task Scheduler

Overview The Task Scheduler is the nervecenter of Urchin −− it is responsible for the actual scheduling and execution of Urchin log processing events for all Profiles. From the Scheduler, you can run tasks immediately or add them to the list of Urchin events for repeated execution at nearly any interval desired. Chapter 3: Urchin Administration

92

How to Use the Scheduler 1. Log−in to your Urchin Administration Interface and click Configuration in the main left−side navigation, then Urchin Profiles. 2. Locate the Profile you wish to schedule and click Edit. 3. Click the Run/Schedule tab. 4. Under Task Settings, select the desired interval. Daily is recommended. 5. Set the time of day for the task. 6. Click Update to save changes. 7. To run the task immediately, click Run Now. Subsequent scheduled tasks will occur according to the schedule you have set.

Recommendations ♦ Most tasks should be scheduled for daily execution, since that is the log rotation schedule for many webservers. However, Urchin's log tracking facility makes it possible to read the same log multiple times without doubling data, so this is not required. Notes on the Scheduler's Operation ♦ All tasks are handled sequentially by Urchin, so multiple tasks given the same time of execution will still be processed one at a time. ♦ To see the results of all tasks that have been executed, see the Task History screen in the Scheduler navigation section.

Chapter 3: Urchin Administration

93

System Settings Changing the Port Number

Changing the Port Number The default port number that the Urchin webserver will listen on is 9999. Changing this number consists of two basic steps: ♦ Changing the port number in the Server Settings screen ♦ Stopping and starting the Urchin services, which will be a slightly different process for Windows versus Unix−type systems The detailed process is as follows: ♦ Login to the Urchin administration interface ♦ Navigate to Configuration−>Settings−>Access Settings and click on the Server Settings tab ♦ Set your new port number in the Server Port Number box ♦ Click on the Update button Now you must restart the Urchin services: ♦ On Unix−type systems go to the bin directory of your Urchin distribution and run: ./urchinctl restart

♦ On Windows systems, from the console, go to Start−>Programs−>Urchin and choose Disable Services, then choose Enable Services. The webserver should now be listening for connection requests on the new port number. This means that the URL used to view reports and configure the Urchin software has changed, and your users should be notified regarding the new URL. Notes Please note that on many systems, root privileges may be required to use port numbers less than 1024. Also, if another service is already running on the port specified, Urchin will fail to start.

Licensing Urchin

Overview Urchin must be licensed in one of the three ways before it can be used: ♦ Obtain Demo License ♦ Buy License ♦ Activate Pre−Purchased License

Chapter 3: Urchin Administration

94

If you are trying out Urchin for the first time, you will want to install a demo license. This is a free 15−day evaluation, which no limitations on Urchin's function. Installing a Demo License To install a demo license, log−in to your Urchin Administration Interface with a web browser (usually http://your.server.com:9999), click "Install Demo License", and follow the on−screen steps, including entering your contact information. It's important to enter your real information, as it will be necessary if and when you decide to purchase Urchin later. Click the Install Demo License link to complete the process. Buy License To purchase a license, log−in to the Urchin system as an administrator and click on Configuration at left. Next, click on the Settings button and then the License button. On the main screen, click the Buy License link, which will take you to our online licensing center. Once you have completed the purchase, the Urchin system will be fully operational in perpetuity. Activate Pre−Purchased License To activate a pre−purchased license (such as if you purchased Urchin on CD or you have moved the Urchin installation to a new server), log−in to the Urchin system as an administrator and click on Configuration at left. Next, click on the Settings button and then the License button. On the main screen, click the Activate Pre−Purchased License link, which will take you to our online licensing Chapter 3: Urchin Administration

95

center. Once you have completed the process, the Urchin system will be fully operational in perpetuity. Installing a License Without Internet Access It is possible to license Urchin without internet access, such as behind a firewall. To accomplish this, you will need to run the "inspector" utility, which is bundled with Urchin and found in the "util" directory. Attach its output to an email and send to [email protected] for assistance. Recommendations ♦ Please enter your real contact information when activating your demo, as it will be necessary for billing purposes if you decide to buy. We will also need to know who you are in order to provide support.

DNS Database Update

DNS Database Update Urchin includes a DNS database which provides the information used in creating the Domain Reports, including the conversion of IP addresses to domain names. These databases are stored in the Urchin data directory and need to be updated on a periodic basis. Urchin includes geo−update which is a utility that checks for updates and downloads new updates when they are available. The utility is scheduled to check for updates once a month and allows the user to set the day and time for the download or allows for disabling the downloads. The geo−update utility can also be used to import custom entries into the DNS databases. For more information, see the geo−update utility article in the Advanced Topics −> Utilities section. Considerations The geo−update utility needs an internet connection to be able to check for and download new updates. The utility uses port 80 to communicate with the webserver providing the updates. It is possible that proxy servers and firewalls can interfere with Urchin's ability to successfully download updates.

Chapter 3: Urchin Administration

96

Chapter 4: Reporting Interface

Report−Side Filtering

Urchin is capable of sophisticated filtering of any text−based report via the reporting interface. To filter in or out any text string, enter it into the Filter box and click the "+" (include) or "−" (exclude) button. The Urchin reporting system will re−query the database and only display corresponding results. To conduct more complex filtering operations, POSIX regular expressions can be used (POSIX is a standard for text manipulation which is beyond the scope of this guide).

Reporting Interface Overview

Welcome to the Urchin Reporting Interface! Overview

Chapter 4: Reporting Interface

97

The Urchin Reporting Interface is the system that displays the actual Urchin reports. To access the Reporting Interface, login to your Urchin Administration Interface and select a report to view. If you are the Administrator, you will have access to all reports. If you are a User, you will have access to those reports specified by the Administrator. Each Profile that has been configured has its own set of reports. Click the magnifying glass icon next to a profile to view the reports for that profile.

Controls Note the Date Range at the top of any report. All data shown is for that time period only. To change the timeframe, just select a different date range from the controls at bottom−left of the screen. See the Date Range article in this section for more information. ♦ Standard/SVG: Urchin can display reports in either standard HTML, or via Adobe's Scalable Vector Graphics (SVG) format. By default, Urchin will attempt to determine if the browser in use has the SVG plug−in installed. If so, reports will be displayed in SVG format. If not, Urchin will use standard HTML. If the user attempts to select SVG with a browser that does not have the SVG plug−in, a link will be provided to a web page with information on getting the plug− in. ♦ Search: To instantly find any item in list−type reports, enter a search term or phrase into the Search box and press Enter. The list will be updated with any matches. ♦ Filter: To filter in or out any items in list−type reports, enter a string of text into the Filter box and click the "+" (include) or "−" (exclude) button. The list will be changed accordingly ♦ #Shown: To show a different number of items in the report being viewed, simply select the desired number from the pulldown menu. ♦ Go To#: If you know the position number of the entry you would like to see, enter it here and press Enter. ♦ Export: ◊ Tab: click the "T" button to export data in tab−delimited format. ◊ Word: click the Word icon to export data in Microsoft Word native format. ◊ Excel: click the Excel icon to export data in Microsoft Excel native format. ♦ Printing: click the printer icon to get a print−friendly view of the data; click the Print Page link from that screen to actually print the report. Recommendations ♦ Try different Date Range settings to see how your data changes over time. For low− traffic sites, a month may be a better timeframe than a week, since traffic might not be statistically significant for that small of a time period. Chapter 4: Reporting Interface

98

See Also ♦ Glossary of Terms

Exporting Data

Overview Urchin's data export function makes it easy to extract data from any Urchin report. This is useful for bringing report data into a spreadsheet, word processor, database, etc. for further analysis. How to Use Export Data To export data from any report, select the appropriate type based on the application you plan to use to manipulate the data. For general database importing, use tab−separated format. For Word and Excel export, the application should launch automatically after the data is exported, and the new document should be populated with the data you have exported. ♦ Tab: click the "T" button to export data in tab−delimited format. ♦ Word: click the Word icon to export data in Microsoft Word native format. ♦ Excel: click the Excel icon to export data in Microsoft Excel native format. Printing: click the printer icon to get a print−friendly view of the data; click the Print Page link from that screen to actually print the report. Recommendations ♦ To export data to a database, tab−separated is usually the preferred format

Date Range

Overview Urchin's Date Range function allows you to view report data by any timeframe desired, from 1 day to the entire period of time for which data exists, or any part thereof. The Date Range feature makes it easy to specify either a standard timeframe (such as a week, month, or year), or any custom timeframe. Using the Date Range Function

Chapter 4: Reporting Interface

99

♦ Standard Date Range: To view data by a standard timeframe such as a day, week, month, or year, just click the desired period in the Date Range navigation area, and the report's data will change accordingly. The Date Range Calendar is clickable in many ways to accomplish this: ◊ Year: click the year to display data for the entire calendar year. ◊ Month: click the name of the month. ◊ Week: click the arrow to the left of the calendar for the week you are interested in. ◊ Date: click the date you are interested in. ◊ Day: to only show data for every instance of a particular day of the week in the currently selected Date Range, click the day name. ◊ Custom: click the Enter Range button, which brings up the Urchin Calendar. Select the starting date in the calendar at left, and the ending date at the calendar at right. Click Apply Date Range, and the report will change to show data for the timeframe selected.

After selecting a custom Date Range, all reports you examine that are compatible with the selected timeframe will display data for that period until either the browser is closed or a different Date Range is specified. Recommendations ♦ If you are examining a low−traffic site, try looking at a longer timeframe to get more meaningful data. ♦ If you are interested in traffic trends over the life of your site (and the data exists), try analyzing a year or more worth of data −− Urchin will adjust the size of bar graph elements to accommodate the selection. Urchin 5.6 feature: You can also see the data displayed hourly, daily, or monthly over your selected date range. Select hourly, daily, or monthly from the Date View pulldown, as shown in the image below.

Chapter 4: Reporting Interface

100

Chapter 4: Reporting Interface

101

Chapter 5: E−commerce Module

E−commerce Overview

Introduction Urchin's E−commerce reporting module expands the power of Urchin's reporting to allow you to follow visitors all the way to the point of conversion and actually measure your ROI on various aspects of your website and marketing campaigns. There are two sections of reporting enabled by the module. The first section of reports, E−Commerce, provides trend analysis of on−line revenue, transactions, and product detail. The second commerce reports section, Revenue Source, correlates revenue against visitor parameters including keywords and search engines.

Chapter 5: E−commerce Module

102

This valuable reporting capability allows you to exploit cross−system on−line business resources and optimize on−line campaigns. Easily calculate your ROI from CPC, and organic search engine placements. System Overview When a visitor to a website makes a purchase, the shopping cart software will make an entry into a transaction log file which, when processed along with normal web traffic logs, creates a complete picture of the E−commerce system.

Urchin processes these logs together and correlates the web site session with the E−commerce transaction. The purchase and product information is stored in the Urchin databases ready for viewing in the E−Commerce reports. Configuration There are three key elements for configuring Urchin's e−commerce processing: ♦ Establish a usable e−commerce log format: Urchin needs to understand how your e−commerce logs are constructed. The choices are ELF2, ELF, or custom log format. See the ELF &ELF2 Log Formats or Custom E−commerce Log Formats articles in this section for details. ♦ Coordinating processing of e−commerce and webserver access logs together: typically e−commerce transactions are tracked separately from normal webserver activity and frequently the sites are not even hosted on the same machine. You'll have to make sure that both sets of logs are available to Urchin so that they can be processed together. Both logs should be listed as log sources in the single profile that is setup to handle your e−commerce reporting. ♦ Choosing a visitor tracking method: the visitor tracking method will determine how well Urchin can correlate e−commerce activity with normal website activity. You should decide which visitor tracking method will yield the level of analysis you desire. The more accurate UTM method requires making some simple modifications to your website documents to achieve the most complete analysis. These modifications should be made to all websites involved in your online business. See the Visitor Tracking section of the Documentation Center for details on setting up UTM. Considerations Chapter 5: E−commerce Module

103

It is strongly advised to have your shopping cart software log in ELF2 format if possible. This will reduce some of the Urchin administration overhead in setting up your e−commerce reporting since Urchin has a built−in capability to deal with this format automatically.

ELF &ELF2 Log Formats

Overview The E−commerce log formats (ELF &ELF2) were designed to record information about customer transactions from online shopping sites. ELF was originally created for use with Urchin 3 and may be used with Urchin 5 when processing data with the IP−Only visitor method. ELF2 is similar to ELF and includes additional fields that allow for visitor correlation using the IP+UserAgent, UTM, and other visitor tracking methods. It is recommended to log data in the ELF2 format since it is able to provide better visitor correlation with your webserver data. If you cannot set up your shopping cart software to log in ELF/ELF2, then you must configure your own Urchin custom log format prior to attempting to process your e−commerce data. This document describes the format of the ELF and ELF2 log files that are created by the shopping cart software and explains how to configure Urchin for processing of e− commerce logs. Configuring Urchin for ELF/ELF2 Log Files You must select specific Urchin configuration parameters depending on your e−commerce log type. ELF processing ♦ In the Log Source−>Log Settings screen set the Log Format to either elf or auto ♦ In the Profile−>Reporting screen set the Visitor Tracking Method in the Profile to IP−ONLY, which is the only method supported when using ELF e−commerce log formats ELF2 processing ♦ In the Log Source−>Log Settings screen set the Log Format to either elf2 or auto ♦ In the Profile−>Reporting screen set the Visitor Tracking Method to any of the choices, which are all supported when using ELF2 Your e−commerce log should be listed as a second log source along with your main website log in the profile that is created to handle your e−commerce reporting. The logs are processed sequentially by Urchin. ELF/ELF2 Log Format Description Both ELF and ELF2 are tab−separated multi−line log formats. The first line begins with an '!' exclamation character and contains overall information about the purchase. Subsequent lines contain detailed information about the items purchased. The first line is referred to as the transaction and the subsequent lines are referred to as items. Blank fields should contain a '−' character. Since tabs are Chapter 5: E−commerce Module

104

used to separate fields, the tab character is not allowed to be used within a field. A typical ELF/ELF2 log file will have the following general form: !transation1 item1 item2 item3 !transaction_2 item1 item2 ... ELF2 Log Format ELF2 Transaction Line The ELF2 transaction line begins with an '!' exclamation and contains the following tab separated fields (empty fields should contain a '−' character): !%{ORDERID} %{REMOTE_HOST} %{DATE/TIME} %{STORE} %{SESSIONID} %{TOTAL} %{TAX} %{SHIPPING} %{BILL_CITY} %{BILL_STATE} %{BILL_ZIP} %{BILL_COUNTRY} %{USER_AGENT} %{COOKIES} where: ♦ %{ORDERID} is the order number ♦ %{REMOTE_HOST} is the hostname/ip address of the remote machine ♦ %{DATE/TIME} is the time in the common log format [dd/mmm/yyyy:HH:MM:SS +/−ZZZZ] ♦ %{STORE} is the name/id of the storefront ♦ %{SESSIONID} is the unique session identifier of the customer ♦ %{TOTAL} is the transaction total including tax and shipping (decimal only, no '$' characters) ♦ %{TAX} is the amount of tax charged to the subtotal ♦ %{SHIPPING} is the amount of shipping charges ♦ %{BILL_CITY} is the billing city of the customer ♦ %{BILL_STATE} is the billing state of the customer ♦ %{BILL_ZIP} is the billing zip code of the customer ♦ %{BILL_COUNTRY} is the billing country of the customer ♦ %{USER_AGENT} is the user agent of the customers browser ♦ %{COOKIES} are the incoming cookies contained in the headers from the customers browser ELF2 Item Line The ELF2 item line contains the following tab separated fields (empty fields should contain a '−' character): Chapter 5: E−commerce Module

105

%{ORDERID} %{REMOTE_HOST} %{DATE/TIME} %{PRODUCT_CODE} %{PRODUCT_NAME} %{VARIATION} %{PRICE} %{QUANTITY} %{UPSOLD} %{USER_AGENT} %{COOKIES} where: ♦ %{ORDERID} is the order number ♦ %{REMOTE_HOST} is the hostname/ip address of the remote machine ♦ %{DATE/TIME} is the time in the common log format [dd/mm/yyyy:HH:MM:SS +/−ZZZZ] ♦ %{PRODUCT_CODE} is the identifier of the product ♦ %{PRODUCT_NAME} is the name of the product ♦ %{VARIATION} is an optional variation of the product for colors, sizes, etc ♦ %{PRICE} is the unit price of the product (decimal only, no '$' signs) ♦ %{QUANTITY} is the quantity ordered of this product ♦ %{UPSOLD} is a boolean (0|1) if the product was on sale ♦ %{USER_AGENT} is the user agent of the customers browser ♦ %{COOKIES} are the incoming cookies contained in the headers from the customers browser ELF2 Log File Example The following 2 lines demonstrate a transaction and corresponding item entry in an ELF2 log: !36530 123.123.123.123 [21/Aug/2003:11:31:45 −0800] − − 895.00 − − Virginia Beach VA 23452 US "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "__utma=171060324.2002410569.1061216915.1061216915.1061490246.2; __utmb=171060324;__utmc=171060324" 36530 123.123.123.123 [21/Aug/2003:11:31:45 −0800] U5−BASE Urchin 5 Base License − 895.00 1 − "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "__utma=171060324.2002410569.1061216915.1061216915.1061490246.2; __utmb=171060324;__utmc=171060324"

ELF Log Format ELF Transaction Line The ELF transaction line begins with an '!' exclamation and contains the following tab separated fields (empty fields should contain a '−' character): !%{ORDERID} %{REMOTE_HOST} %{STORE} %{SESSIONID} %{DATE/TIME} %{TOTAL} %{TAX} %{SHIPPING} %{BILL_CITY} %{BILL_STATE} %{BILL_ZIP} %{BILL_COUNTRY} where: ♦ %{ORDERID} is the order number ♦ %{REMOTE_HOST} is the hostname/ip address of the remote machine Chapter 5: E−commerce Module

106

♦ %{STORE} is the name/id of the storefront ♦ %{SESSIONID} is the unique session identifier of the customer ♦ %{DATE/TIME} is the time in the common log format [dd/mmm/yyyy:HH:MM:SS +/−ZZZZ] ♦ %{TOTAL} is the transaction total including tax and shipping (decimal only, no '$' characters) ♦ %{TAX} is the amount of tax charged to the subtotal ♦ %{SHIPPING} is the amount of shipping charges ♦ %{BILL_CITY} is the billing city of the customer ♦ %{BILL_STATE} is the billing state of the customer ♦ %{BILL_ZIP} is the billing zip code of the customer ♦ %{BILL_COUNTRY} is the billing country of the customer ELF Item Line The ELF item line contains the following tab separated fields (empty fields should contain a '−' character): %{ORDERID} %{PRODUCT_CODE} %{PRODUCT_NAME} %{VARIATION} %{PRICE} %{QUANTITY} %{UPSOLD} where: ♦ %{ORDERID} is the order number ♦ %{PRODUCT_CODE} is the identifier of the product ♦ %{PRODUCT_NAME} is the name of the product ♦ %{VARIATION} is an optional variation of the product for colors, sizes, etc ♦ %{PRICE} is the unit price of the product (decimal only, no '$' signs) ♦ %{QUANTITY} is the quantity ordered of this product ♦ %{UPSOLD} is a boolean (0|1) if the product was on sale ELF Log File Example The following lines demonstrate 2 transactions and corresponding item entries in an ELF log: !12313 ppp−46.mia−tc−2.netrox.net ZongStore 1102323131 [27/Jul/1999:11:43:02 −0700] 198.12 8.12 10.00 Cedar Rapids Iowa 52403 US 12313 102 T Shirt XL 10.00 10 0 12313 103 Boxers L 9.00 10 0 !12314 213.12.54.123 − 110123413 [27/Jul/1999:11:43:02 −0700] 11.75 0.75 1.00 Santa Ana CA 92705 US 12314 102 T Shirt S 10.00 1 0

Custom E−commerce Logs

Chapter 5: E−commerce Module

107

Overview Many shopping carts provide the ability to capture and log valuable information regarding purchases in formats other than ELF or ELF2, and therefore cannot be automatically processed by Urchin. This article explains how to create a custom log format for your E−commerce log file if you cannot alter your shopping cart to generate ELF/ELF2. Before continuing, please read the article titled "Custom Log Formats" in the Advanced Topics−>Customization section of the Document Library, which explains the creation of custom logs in detail. E−commerce Log Format Types Shopping carts are capable of logging information about purchases and the items purchased in either a single line format or a multi−line format. In single format each line contains all the information necessary to completely describe a transaction and the items purchased and all lines have the same layout. In multi−line formats, multiple lines are used to describe a purchase, with one format for the transaction lines and another format for the items purchased. ELF/ELF2 logs are multi−line formats. You must examine your E−commerce logs to determine if the data is single line or multi−line as this will affect how you set up your custom log format. Please follow the instructions below depending on your type of log format. General E−commerce Logging Requirements Regardless of the format of the log entries your shopping cart produces, each entry must contain the date and time and at least one of the following fields to provide visitor correlation: ♦ Remote Host or IP Address (for IP−Only or IP−Useragent visitor methods) ♦ Useragent (for IP−Useragent visitor method) ♦ Cookies (for UTM or SID visitor method) ♦ Session ID (for SID visitor method) If any of the above fields are missing Urchin will not produce meaningful analysis of your revenue. Urchin also defines the following E−commerce fields: ♦ %{ORDERID} is the order number ♦ %{STORE} is the name/id of the storefront ♦ %{SESSIONID} is the unique session identifier of the customer ♦ %{TOTAL} is the transaction total including tax and shipping (decimal only, no '$' characters) ♦ %{TAX} is the amount of tax charged to the subtotal ♦ %{SHIPPING} is the amount of shipping charges ♦ %{BILL_CITY} is the billing city of the customer ♦ %{BILL_STATE} is the billing state of the customer ♦ %{BILL_ZIP} is the billing zip code of the customer ♦ %{BILL_COUNTRY} is the billing country of the customer ♦ %{PRODUCT_CODE} is the identifier of the product ♦ %{PRODUCT_NAME} is the name of the product ♦ %{VARIATION} is an optional variation of the product for colors, sizes, etc ♦ %{PRICE} is the unit price of the product (decimal only, no '$' characters) Chapter 5: E−commerce Module

108

♦ %{QUANTITY} is the quantity ordered of this product ♦ %{UPSOLD} is a boolean (0|1) if the product was on sale Single−line Format Logs Follow these instructions if your E−commerce log file only contains hits that all have the same line format as explained above. 1. Create a new custom log format in the lib/custom/logformats directory by making a copy of the custom.lf.sample logformat file. Name your copy with a .lf suffix. 2. Edit your new custom log format file and set the following entries based on the recommendations below: ◊ PrimaryPositions: This entry specifies the order of fields in your log file. Create a comma separated list of field ids which describes your field order. The field names and ids are found in the lib/reporting/logformats/fieldlist.txt file. See example below. ◊ SecondaryPositions: Leave this as '−' since it is not used for single−line format log files. ◊ PrimaryKey: Leave this as '−' since it is not used for single−line format log files. ◊ SecondaryKey: Leave this as '−' since it is not used for single−line format log files. ◊ PrimaryContent: Valid entries for this field are TRANSACTION or ITEM. If the hits in your log file describe the purchase of each individual product, set this to ITEM. If the hits in the log file describe the entire purchase, set this to TRANSACTION. ◊ SecondaryContent: Leave this as '−' since it is not used for single−line format log files. ◊ CommentKey: If some of the lines in your log file are comments or are not considered hits and begin with a specific character, enter the character here. ◊ FieldSeparator1: The field separators define which characters are considered field separators. Typical entries are tabs (\t) and spaces (\s). Set these appropriately based on the characters between the fields in your log file. ◊ FieldSeparator2: See FieldSeparator1 above ◊ QuotesEscapeSep: This specifies whether field separators will be ignored inside a field that contains quote "" characters. This should probably be left as YES. ◊ BracketsEscapeSep: This specifies whether field separators will be ignored inside a field that contains bracket [] characters. This should probably be left as YES. ◊ MergSuccessiveSep: This specifies whether to consider two separator characters in a row as one separator. This can probably be left as NO. ◊ CleanWhiteSpace: This specifies whether to remove white space from the ends of the fields when they are parsed. This can probably be left as NO. ◊ StatusRequired: Leave this set to NO unless your hits contain web server type status codes ◊ CustomDateFormat: If your log format contains a custom date format, set the appropriate strptime format that describes the entry ◊ CustomTimeFormat: If your log format contains a custom time format, set the appropriate strptime format that describes the entry 3. Save your custom log format in the lib/custom/logformats directory 4. Select the custom log format for your log source in the Urchin Admin interface. 5. Process your log file(s) with Urchin. Single−line Format Example

Chapter 5: E−commerce Module

109

The following example is a single hit from a log that only has transaction data. 12345 123.123.123.123 "Urchin Store" [26/Aug/2003:11:43:02 −0700] 192.73 "San Diego" "CA" 92101 "US" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" "__utma=171060324.2734232095.1061444425.1061444425.1061444763.2"

The list below shows each field name listed with the id number obtained from the lib/reporting/logformats/fieldslist.txt file. The id numbers thus assigned are used in the PrimaryPositions field in your custom log format file. 1. Transaction ID 25 2. Remote Host or IP Address 12 3. Store Name 26 4. Apache Date/Time 3 5. Total Cost 28 6. Bill City 31 7. Bill State 32 8. Bill Zip 33 9. Bill Country 34 10. User Agent 13 11. Cookies 14 Based on the list above, you would set the following entries in the custom logformat file: PrimaryPositions: "25, 12, 26, 3, 28, 31, 32, 33, 34, 13, 14" SecondaryPositions: − PrimaryKey: − SecondaryKey: − PrimaryContent: TRANSACTION SecondaryContent: − CommentKey: # FieldSeparator1: \s FieldSeparator2: \t QuotesEscapeSep: YES BracketsEscapeSep: YES MergSuccessiveSep: NO CleanWhiteSpace: NO StatusRequired: NO CustomDateFormat: − CustomTimeFormat: −

The PrimaryPositions specify the field order and the PrimaryContent tells Urchin that this log contains transactions (or general information about purchases). The field separators were set to space and tab since the fields were separated by white space. The custom date/time formats were not specified since the date/time was formatted as an Apache date. Multi−line Format Logs Urchin has the ability to read multi−line formats as long as the beginning character of each line contains a specific character that can identify which format is being used. For example, the ELF/ELF2 log files contain a '!' exclamation character as the first character in the transaction line. The item lines do NOT contain a leading '!' character. Chapter 5: E−commerce Module

110

Follow these instructions if your E−commerce log file contains two different format lines, one for the transaction and the other for product or item details. 1. Create a new custom log format in the lib/custom/logformats directory by making a copy of the custom.lf.sample logformat file. Name your copy with a .lf suffix. 2. Edit the new custom log format file and set the following entries based on the recommendations below: ◊ PrimaryPositions: This entry specifies the order of fields in your log file. Create a comma separated list of field ids which describes your field order. The field names and ids are found in the lib/reporting/logformats/fieldlist.txt file. ◊ SecondaryPositions: This entry specifies the order of fields in your log file. Create a comma separated list of field ids which describes your field order. The field names and ids are found in the lib/reporting/logformats/fieldlist.txt file. ◊ PrimaryKey: Set the primary key to the character that identifies the log file line as the same format described by the primarypositions ◊ SecondaryKey: Set the seconday key to the character that identifies the log file line as the same format described by the secondarypositions ◊ PrimaryContent: Valid entries for this field are TRANSACTION or ITEM. If the hits in your log file describe the purchase of each individual product, set this to ITEM. If the hits in the log file describe the entire purchase, set this to TRANSACTION. ◊ SecondaryContent: See PrimaryContent above ◊ CommentKey: If some of the lines in your log file are comments or are not considered hits and begin with a specific character, enter the character here. ◊ FieldSeparator1: The field separators define which characters are considered field separators. Typical entries are tabs (\t) and spaces (\s). Set these appropriately based on the characters between the fields in your log file. ◊ FieldSeparator2: See FieldSeparator1 above ◊ QuotesEscapeSep: This specifies whether field separators will be ignored inside a field that contains quote "" characters. This should probably be left as YES. ◊ BracketsEscapeSep: This specifies whether field separators will be ignored inside a field that contains bracket [] characters. This should probably be left as YES. ◊ MergSuccessiveSep: This specifies whether to consider two separator characters in a row as one separator. This can probably be left as NO. ◊ CleanWhiteSpace: This specifies whether to remove white space from the ends of the fields when they are parsed. This can probably be left as NO. ◊ StatusRequired: Leave this set to NO unless your hits contain web server type status codes ◊ CustomDateFormat: If your log format contains a custom date format, set the appropriate strptime format that describes the entry ◊ CustomTimeFormat: If your log format contains a custom time format, set the appropriate strptime format that describes the entry 3. Save your custom log format in the lib/custom/logformats directory 4. Select the custom log format for your log source in the Urchin Admin interface. 5. Process your log file(s) with Urchin.

Chapter 5: E−commerce Module

111

Visitor Correlation

Overview Visitor correlation is the process of identifying visitor behavior even if the sessions come from different log files or independnet web or E−commerce servers. Urchin uses data from the various log sources to analyze relationships between sessions and transactions and then correlates this information to provide a clear picture of how visitor activity relates to purchases, thereby providing valuable return on investment reporting. For example, referrals from search engines and specific keywords can be correlated with the amount purchased from your E−commerce site to tell you which referrals and search terms are yielding the most revenue. Typically you will have one log for your public facing website, and another log from your secure transaction website. In order to correlate disparate data sources together, Urchin must use the same visitor identification method for each site. Types of Visitor Correlation Urchin is capable of correlating visitors based on several methods. These methods are described in the Visitor Identification Methods article in the Visitor Tracking section. Your choice of visitor tracking method will directly affect the accuracy of the information Urchin has at its disposal to correlate the E−commerce transactions with other website activity. Please choose the method that is suitable for the level of detail you desire. The following data is required in the E−commerce log file for each of the visitor methods: ♦ UTM: cookies ♦ SID: cookies or SID field ♦ Username: username ♦ IP+UserAgent: remote host or IP address and user−agent (i.e. browser type) ♦ IP−Only: remote host or IP address These items need to be considered when you examine your E−commerce logging format. Please review the ELF &ELF2 Log Formats and Custom E−commerce Log Formats documents in this section for more detail on how to ensure the proper data is in your logs. Configuraton This section presents a general overview of e−commerce visitor correlation configuration issues. Non−UTM Sites If you do not have the UTM sensor installed on your sites, then your visitor correlation configuration will depend on what e−commerce format you are using. ♦ ELF: Use IP−Only as your visitor tracking method. In such a case Urchin will automatically correlate the various sessions as long as all logs contain the IP fields. Simply choose IP−Only for the Visitor Tracking Method in the Profile's Reporting screen. Chapter 5: E−commerce Module

112

♦ ELF2: Use IP+User−Agent as your visitor tracking method. In such a case Urchin will automatically correlate the various sessions as long as all logs contain the IP and User−Agent fields. Simply choose IP+User−Agent for the Visitor Tracking Method in the Profile's Reporting screen. UTM−Enabled Sites For UTM−enabled sites, the same version of the UTM sensor must be installed on all the pages you want to track. In the Profile Reporting screen set Visitor Tracking Method to Urchin Traffic Monitor. In general you would set the UTM Domain to the domain that is common to the sites you're processing. For example, when processing web logs from ads.urchin.com along with logs from secure.urchin.com the UTM Domain would be set to urchin.com. For specific details on installing and configuring UTM, please see the Visitor Tracking section of the Documentation Library.

Cancelling E−commerce Transactions

It is sometimes necessary to back−out or cancel an e−commerce transaction. Cancelling orders which did not go through or which were disallowed for one reason or another ensures that your Urchin reports, including Campaign Tracking reports, provide accurate information. To cancel an order or transaction, find the transaction in your ELF or ELF2 log. Then, create a duplicate entry which contains a negative transaction total that cancels out the original transaction. For example, if the the original transaction total is $699, enter a duplicate entry with −699 dollars as the transaction total. Read ELF &ELF2 Log Formats to understand the E−commerce log format that applies to you.

Chapter 5: E−commerce Module

113

Chapter 6: Campaign Tracking Module

Campaign Tracking Overview

The Urchin Campaign Tracking Module accurately tracks visitors from a source, such as a search engine or email link, to a conversion or transaction on your site.

With the Urchin Campaign Tracking Module, you gain the benefits of: ♦ Multi−Session Tracking: Track visitors from lead to conversion across multiple sessions. Chapter 6: Campaign Tracking Module

114

♦ ROI Analysis: Buy the keywords that convert. Cut those that don't. ♦ Goal Conversion: Verify conversion to purchase or any other goal. ♦ A/B Testing: Test content and go with what works. ♦ Click Fraud Reporting: Identify and take action against click fraud. ♦ Day Parts Reporting: Don't waste money when your audience is away. ♦ Multi−Dimensional Comparisons: Marketing campaigns, advertising channels, e− mail blasts, search engines, specific keywords, organic searches, and more. How does it work? The Urchin Campaign Tracking Module tracks data from a variety of sources to provide closed−loop ROI analysis. Let's look at the steps. Step 1: From Link to Web Page Each visitor to your site enters via a link indicating where they clicked from, the keywords they used, if any, as well as campaign and medium information. The patent−pending Urchin Traffic Monitor (UTM−3 and UTM−4), which is part of the Campaign Tracking Module, parses the link to obtain this information.

The UTM is a small amount of JavaScript code in each of your web pages. You can install the UTM−3 or UTM−4 manually in each web page or automatically via server side includes and other template systems. Once installed, the UTM is triggered each time a visitor views the page. The UTM performs three tasks; ♦ it ensures that a page hit is registered in the web log if the page was cached or proxied, ♦ it parses the link to obtain and log campaign information, and ♦ it updates visitor activity information. Step 2: Parsing the Link The UTM parses the incoming link to obtain the campaign information. For example, http://www....com/?utm_source=googleper−click indicates that the visitor clicked on a cost−per−click link on the Google search engine. (UTM−4 automatically detects the keywords that the visitor searched on.) Although this particular link uses only two variables, utm_source and utm_medium, which indicate the source, Google, and the medium "cost−per−click", your links may incorporate three additional variables: utm_campaign, utm_content, and utm_term. These three variables are available to indicate a specific marketing initiative, ad content, and a paid search term (necessary for UTM−3), respectively. Information on these variables and how to set up your Urchin Campaign Tracking Module software is provided in the article Step 1: Track Campaign Data. The UTM is not limited to parsing links that you embed in emails or paid keywords, but also parses keyword information from organic links. This is important because it enables you to make side−by−side comparisons of paid versus unpaid search results. The UTM recognizes links from the top search engines and parses out the source and keyword information. In addition, the Campaign Tracking Module can also be configured to recognize and parse links from custom organic search Chapter 6: Campaign Tracking Module

115

engines, if required. Information on how to do this is provided in the article Adding a Custom Search Engine. Step 3: Logging Campaign Information and User Activity The UTM does two things with the campaign information it parses from the links; it ♦ formats a web document request that allows the web server to make a special entry in the web log, and ♦ updates the client first−party cookie. The UTM formats the information it parses from the link into the appropriate web document request that will result in the web server adding the referral information to the web log. The UTM also reads the client's first−party cookie, updating user tracking information as required. For example, if this is the user's first visit to your site, the UTM will add the campaign tracking information to the cookie. If the user previously found and visited your site, the UTM increments the session counter in the cookie. Regardless of how many sessions or how much time has passed, the UTM "remembers" the original referral. This gives the Campaign Tracking Module true multi−session tracking capability. Step 4: Adding Goal and CPC Data For the purposes of campaign tracking and ROI calculation, the Urchin Database receives ♦ a conversion goal via the Urchin Admin interface (optional), ♦ search engine cost−per−click data (optional), and ♦ data from the web log. Once a page in your web site has been defined as a conversion goal, the Urchin Campaign Tracking Module will be able to calculate metrics indicating how successful your site is at converting visitors. By comparing referrals, sessions, and visitor activity to conversions, the Urchin Campaign Tracking Module can report on the effectiveness of your keywords, mediums, campaigns, and content. The system can also report latency metrics such as time to goal and sessions to goal. To learn how to define a conversion goal, read Step 3: Define a Conversion Goal. The Campaign Tracking Module allows you to import your cost−per−click data directly from your Google and Overture spending accounts. This allows the system to report ROI at all levels of granularity, from per−keyword/per−search engine to per−campaign aggregates. To learn how to import spending data from Google, read Import Cost Data from Google. To learn how to import spending data from Overture, read Import Cost Data from Overture. Updates from the web log to the Urchin Database occur according to the schedule that you establish for your profile, as part of your Urchin base product configuration. Step 5: Closing the Loop: Reporting and ROI Once the the Urchin database has been updated with visitor activity, a conversion goal, and cost−per−click data, the Urchin Reporting Engine is able to create over fifty campaign tracking reports. Among these reports is the following report excerpt, which compares ROI for the keyword Chapter 6: Campaign Tracking Module

116

"analytics system architecture" for each search engine (both "cpc", cost−per− click, and "organic") on which visitors searched for the keyword.

In this case, the keyword "analytics system architecture" was purchased on Google Adwords (google[cpc]). Visitors clicked on the sponsored link 102 times, for a total cost of $6.34 to the advertiser. A revenue amount of $89.15 resulted from these clicks, for an ROI of 1306.15The average value of each click indicates that the advertiser should bid a maximum of 88 cents per click on this keyword. There was also one click on an organic (unpaid) search link, but it did not result in any revenue. The Next Step With visitor tracking and referral link parsing by the UTM and E−commerce revenue and cost−per−click data import, the Campaign Tracking Module can accurately correlate conversions to specific campaigns and keywords, provide side−by−side comparisons of paid versus unpaid keywords, and calculate ROI and conversion ratios for keyword buys. To begin realizing these benefits, read Step 1: Track Campaign Data.

The Five Dimensions of Campaign Tracking

Effective campaign tracking uses a combination of the following five marketing dimensions: ♦ Source ♦ Medium ♦ Term ♦ Content ♦ Campaign This article describes how these marketing dimensions are used in the Urchin Campaign Tracking Module to track campaign referrals. Chapter 6: Campaign Tracking Module

117

Source Every referral to a web site has an origin, or source. Examples of sources are the Google search engine, the AOL search engine, the name of a newsletter, or the name of a referring web site. Medium The medium helps to qualify the source; together, the source and medium provide specific information about the origin of a referral. For example, in the case of a Google search engine source, the medium might be "cost−per−click", indicating a sponsored link for which the advertiser paid, or "organic", indicating a link in the unpaid search engine results. In the case of a newsletter source, examples of medium include "email" and "print". Term The term or keyword is the word or phrase that a user types into a search engine. Content The content dimension describes the version of an advertisement on which a visitor clicked. It is used in content−targeted advertising and Content (A/B) Testing to determine which version of an advertisement is most effective at attracting profitable leads. Campaign The campaign dimension differentiates product promotions such as "Spring Ski Sale" or slogan campaigns such as "Get Fit For Summer". To Learn More To learn how to use Urchin Campaign Tracking Management software to track your referrals along the five dimensions of campaign tracking, read Step 1: Track Campaign Data.

Step 1: Track Campaign Data (Set up UTM−3)

In order to track campaign data, you need to: ♦ copy the UTM files to your web site document root, ♦ reference the UTM in your HTML and enable cookies in your logging, ♦ pass the UTM variables in your links. Copy the UTM files to your web site document root Copy the files __utm.js and __utm.gif from the util/utm directory of your Urchin distribution to your web site document root. Important: Do not change the names of these files. Reference the UTM in your HTML and enable cookies in your logging Chapter 6: Campaign Tracking Module

118

In the Visitor Tracking section of the Urchin documentation, find the Quick−Install article that applies to your environment. Follow the instructions in Step 2 and Step 3 of the article to reference the UTM in your HTML and enable cookies in your logging. Pass the UTM variables in your links The UTM variables provide a way of tracking your referrals along the five dimensions of campaign tracking by attaching campaign data to your links. The UTM parses this campaign data to determine the referral source, the keywords used, and other campaign tracking information. To pass the UTM variables in a link, add a question mark(?) to the URL followed by the variables and values you would like to assign. Values may be any string containing letters, numbers, underscore(_), and plus(+). Use underscores to separate multiple words in a value (e.g. utm_campaign=think_different); your URL may not contain spaces. Urchin provides a URL builder tool that creates links for you, embedding the campaign information that you specify. Using this tool ensures that your links contain the correct syntax. The following link indicates that the visitor was referred by a paid Google link and that the visitor had searched on the keywords "running shoes". It also indicates that the medium was cost−per−click. Example http://www.mycompany.com/?utm_source=googletm_term=running+shoes How you use the variables in your links will depend upon your campaign tracking objectives. ♦ For recommendations on how to use the UTM variables for Search Engine Marketing, read How To Analyze Keyword Buying. ♦ For recommendations on how to use the UTM variables for A/B testing, read How To Perform A/B Testing. ♦ For recommendations on how to use the UTM variables for content−targeted advertising, read How To Track Content−Targeted Ads. Learn how to use each variable from the table below. Variable Name

Description

Example

Required. Use utm_source to identify a search engine or utm_source=google other source. Recommended. Use utm_medium to identify a medium utm_medium utm_medium=cpc such as email, cost−per−click(cpc), or cpc−content. Required for keyword analysis using Urchin 5.5/UTM−3. (Urchin 5.6/UTM−4 and later versions, automatically detect keywords from cost−per−click utm_term referrals.) Use utm_term to identify the keywords that the utm_term=running+shoes visitor searched on to get your link. If you specify a utm_term with Urchin 5.6/UTM−4 and higher, your specified term overrides the detected term. utm_source

Chapter 6: Campaign Tracking Module

119

Required for content−targeted advertising and A/B testing. Use utm_content to differentiate ads or links that utm_content=logolink point to the same URL. Required for keyword analysis. Use utm_campaign to utm_campaign identify a specific product promotion or strategic utm_campaign=spring_sale campaign. utm_content

Step 2: Install and License Campaign Tracking

You must have an Urchin base product license and a Campaign Tracking Module license in order to use the Campaign Tracking Module. If you wish to perform ROI calculations, you must also have an E−commerce Module License. If you have not yet installed the Urchin base product, read the Getting Started−−>System Requirements−−>Urchin Setup Requirements article and the Getting Started−−>Installation section now. If you are already using the Urchin base product, follow these steps to obtain a Campaign Tracking Module License: 1. Sign into Urchin as Administrator. 2. In the Urchin Admin Interface, click Configuration. 3. Click Settings−−>License. The License Information screen appears with a link entitled Upgrade License. 4. Click Upgrade License. The Upgrade License wizard appears. 5. Follow the steps as indicated in the Wizard to purchase and install the license.

Step 3: Define a Conversion Goal

Chapter 6: Campaign Tracking Module

120

A goal is a web site page which a visitor reaches once she or he has made a purchase or performed some other desired action, such as a download or user registration. Before Urchin can calculate goal conversion metrics, you must define one or more goals within your campaign profile. What is my campaign profile? Your campaign profile is the profile from which you intend to run campaign reports. ♦ If you have never used Urchin before, you will first need to create a basic profile. Follow the instructions in Urchin Administration−−>Profiles−−>Working with Profiles, then follow the instructions in this article.

♦ If you have already defined a profile, follow the instructions in this article to enable the profile for campaign tracking and create a conversion goal. Enable a profile for campaign tracking and create a conversion goal 1. Sign on as Administrator and, in the Admin Interface, click Configuration. 2. Click Urchin Profiles−−>Profiles. 3. Click the Edit key next to the profile you wish to edit. 4. In the Profile Settings tab, click the Campaign Website radio button and click Update. 5. On the Profile Filters tab, make sure that the following filters are applied: ◊ Decode UTM Campaign Content ◊ Decode UTM Campaign Name ◊ Decode UTM Campaign Source (Medium) ◊ Decode UTM Campaign Source (Medium) Term ◊ Decode UTM Campaign Term If these filters are not applied, click Add. The Filter Wizard appears. Select Pre−Configured Filter radio button and press Next. Select the filters listed above in the Available Filters area, move them to the Applied Filters area, and click Finish. On the Profile Filters tab, click Update. 6. In the Reporting tab, add a Primary Goal Match, a Primary Goal Field, and click Update. Urchin logs a goal completion each time that the Primary Goal Field matches the value specified in Primary Goal Match. Any POSIX regular expression may be entered in the Primary Goal Match field. For example, if you select "request_stem" from the drop−down menu as the Primary Goal Field and enter "/downloads" as the Primary Goal Match, Urchin logs a goal completion each time the request_stem (i.e. request URI without query information) has a value of "/downloads". The field "request_stem" is the most common Primary Goal Field used, however, other fields may be used as well. For a complete description of fields, read Reference−−>Regular Field Chapter 6: Campaign Tracking Module

121

List. Setting Multiple Goals It is possible to set multiple goals in the Primary Goal Match field. For example, entering the following in the Primary Goal Match field and "request_stem" in the Primary Goal Field: /((forms/(downloadarea|registerarea)_confirmation)|special/profile_form))\.asp will tag the following pages as goals: ♦ /forms/downloadarea_confirmation.asp ♦ /forms/registerarea_confirmation.asp ♦ /special/profile_form.asp You may enter any POSIX regular expression, up to 255 characters in length, in the Primary Goal Match field.

Tagging Your Online Links 1−2−3

If you are using the Urchin Profit Suite or the Urchin Campaign Tracking Module, you'll want to make sure that you've got a comprehensive strategy for tagging your online ads. This is an important prerequisite to allowing Urchin to show you which marketing activities are really paying off. Fortunately, the tagging process goes smoothly − once you understand how to differentiate your campaigns. Here is a three−step process to help you get started. 1. Tag only what you need to. Generally speaking, you need to tag all of your paid keyword links (such as those on Google Adwords and Overture), your banners and other ads, and the links inside your promotional e−mail messages. There are certain links that you don't need to, and many times will not be able to tag. You should not attempt to tag organic (unpaid) keyword links from search engines and it isn't necessary to tag links that come from referral sites, such as portals and affiliate sites. Urchin automatically detects the search engine and keyword from organic (unpaid) keyword referrals, and you'll see metrics for these referrals in your Urchin reports, typically under "Organic" listings. Urchin also detects referrals from other websites and displays them in your reports, whether or not you have tagged them. 2. Create your links using the URL Builder. Campaign links consist of a URL address followed by a question mark and your campaign variables. But, you won't need to worry about link syntax if you fill out the URL Builder form and press the Generate URL button. A tagged link will be generated for you and you'll be able to copy and paste it to your ad. If you are asking "which fields should I fill in?", you're ready for Step 3.

Chapter 6: Campaign Tracking Module

122

3. Use only the campaign variables you need. Urchin's link tagging capabilities allow you to uniquely identify virtually any campaign you can think of. But, don't think that you must use all six fields in the URL Builder form in each of your links. On the contrary, you should usually only need to use three: Source, Medium, and Campaign. Let's look at the best ways to tag the three most common kinds of online campaigns − banner ads, email campaigns, and paid keywords. Banner Ad

E−mail Campaign

Pay Per Click Keywords

Campaign Source

citysearch

newsletter1

google

Campaign Medium

banner

email

ppc

productxyz

productxyz

productxyz

Campaign Term Campaign Content Campaign Name

You'll notice that Campaign Term isn't used for any of these links, even the Pay Per Click Keyword campaign. That's because Term is no longer necessary as long as you are using Urchin 5.6/UTM−4 and later. (Campaign Term IS necessary if you are still using Urchin 5.5 and UTM−3.) What about Campaign Content? We're only covering the most common scenarios in this article, but if you are interested in Campaign Content, read the article How To Perform A/B Testing. What about the Campaign ID/Master Tracking Code? If you want to hide the tagging information that you put in your links, Urchin gives you a way of creating a table that keeps all the information private. To read more about this, see the article How To Use Master Tracking Codes. So get started tagging your links and tracking your way to online success!

Import Cost Data from Google

Importing cost data from Google is easy. Just perform the following steps: ♦ Download your Google AdWord spending into a log file. Download your spending data on a daily or weekly basis, prior to the regularly scheduled run of your profile (or before manually runnning the profile).

Chapter 6: Campaign Tracking Module

123

♦ Modify your profile to read the log file. You will only need to modify your profile once, as part of your initial setup. Download Google AdWord spending into a log file 1. On adwords.google.com, log in to your Google AdWords account. 2. In the Reports tab, click Custom Report. 3. Fill out the URL Report fields and click Create Report.

◊ View − check the Daily Metrics radio button. ◊ Date Range − If this is the first time you are downloading data for campaign tracking, enter a date range beginning with the date you started tracking campaign data and ending with yesterday's date. (The date you started tracking campaign data is the date you completed implementing the instructions in the article Campaign Tracking Module−−>Step 1: Track Campaign Data.) If you have already downloaded historical data, enter a date range beginning with the day after your previous download end−date, and ending with yesterday's date. For daily downloads (recommended), enter a date range beginning with yesterday's date and ending with yesterday's date. ◊ Detail Level − click Show options. ⋅ Check Keyword names and Include all keywords. ⋅ Check Campaign names and select All Campaigns. ⋅ Check Ad Group names and select All Ad Groups. ◊ Values−

◊ Ad Text− Check Destination URL.

◊ Conversions− (Optional. Check the following if you have enabled conversion tracking on your Google Adwords account. If you do not have conversion tracking enabled, skip this step.) Chapter 6: Campaign Tracking Module

124

◊ Report name − Enter a report name. ◊ Scheduling − Checkmarking this box means that you will not have to define this report again. The report format will be saved and will be automatically run each day, week, or month (according to your selection). ◊ Email − Checkmarking this box tells Google to email you when the report has run.

4. Click the "Create Report" button 5. Once you have manually run the report, or once it has been automatically scheduled and run by Google, the report will appear in the Download Center. (Click the Download Center link at the top of the page.) Select the report for the day or week you want and download it as a .tsv file. Modify your profile to read the log file You will only need to modify your profile once, as part of your initial setup. 1. In the Urchin Admin interface, click Configuration, and then Urchin Profiles−− >Profiles. 2. Click the Edit icon for your campaign profile. Your campaign profile is the profile that you configured as part of Step 2:Configure Urchin−−>Define a Conversion Goal. 3. On the Profile Settings tab, make sure that Profile Type is Campaign with E− Commerce Website. 4. On the Reporting tab, under Campaign Options, make sure that Primary Goal Match and a Primary Goal Field are filled in. If they are not, read Step 2:Configure Urchin−−>Define a Conversion Goal. 5. On the Profile Filters tab, make sure that the following filters are applied: ◊ Decode UTM Campaign Content ◊ Decode UTM Campaign Name ◊ Decode UTM Campaign Source (Medium) ◊ Decode UTM Campaign Source (Medium) Term ◊ Decode UTM Campaign Term If these filters are not applied, click Add. The Filter Wizard appears. Select Pre−Configured Filter radio button and press Next. Select the filters listed above in the Available Filters area, move them to the Applied Filters area, and click Finish. On the Profile Filters tab, click Update. Chapter 6: Campaign Tracking Module

125

6. On the Log Sources tab, click Add. The Log Source Wizard appears. 7. Select Pre−Configured Log Source and click Next. 8. In the Available Log Sources area, select a log source that contains your Google AdWords spending data, move it to the Log Sources to Process area, and click Finish. 9. On the Log Sources tab, click Update.

Import Cost Data from Overture

Importing cost data from Overture is easy. Just perform the following steps: ♦ Download your Overture spending into a log file. Download your spending data on a daily or weekly basis, prior to the regularly scheduled run of your profile (or before manually runnning the profile).

♦ Modify your profile to read the log file. You will only need to modify your profile once, as part of your intial setup. Download Overture spending into a log file 1. On www.overture.com, click Advertiser Login and log in to your Overture account. 2. Click the Reports tab on the top of the page. 3. In the Select a Report Type dropdown menu, select Acount Activity Detail (Match Type) 4. Specify a filter, date range, and click Create Report. ◊ Select Overture Results filter. ◊ Enter a date range. If this is the first time you are downloading data for campaign tracking, enter a date range beginning with the date you started tracking campaign data and ending with yesterday's date. (The date you started tracking campaign data is the date you completed implementing the instructions in the article Campaign Tracking Module−−>Step 1: Track Campaign Data.) If you have already downloaded historical data, enter a date range beginning with the day after your previous download end−date, and ending with yesterday's date. For daily downloads (recommended), enter a date range beginning with yesterday's date and ending with yesterday's date.

Chapter 6: Campaign Tracking Module

126

5. When the report appears in your browser, scroll to the bottom of the page and click Download as Spreadsheet. 6. Save the file to your logsource directory. 7. Convert the report file to the UTF−8 format, using one of the two methods below. ◊ Open the file in Excel, File−−>Save As, and choose tab−delimited format.

or ◊ In the util directory of your Urchin distribution, execute the following script: iconv −f UTF−16 −t UTF−8 filename > newlogsourcename Modify your profile to read the log file You will only need to modify your profile once, as part of your intial setup. 1. In the Urchin Admin interface, click Configuration, and then Urchin Profiles−− >Profiles. 2. Click the Edit icon for your campaign profile. Your campaign profile is the profile that you configured as part of Step 2:Configure Urchin−−>Define a Conversion Goal. 3. On the Profile Settings tab, make sure that Profile Type is Campaign with E− Commerce Website. 4. On the Reporting tab, under Campaign Options, make sure that Primary Goal Match and a Primary Goal Field are filled in. If they are not, read Step 2:Configure Urchin−−>Define a Conversion Goal. 5. On the Profile Filters tab, make sure that the following filters are applied: ◊ Decode UTM Campaign Content ◊ Decode UTM Campaign Name ◊ Decode UTM Campaign Source (Medium) ◊ Decode UTM Campaign Source (Medium) Term ◊ Decode UTM Campaign Term If these filters are not applied, click Add. The Filter Wizard appears. Select Pre−Configured Filter radio button and press Next. Select the filters listed above in the Available Filters area, move them to the Applied Filters area, and click Finish. On the Profile Filters tab, click Update. 6. On the Log Sources tab, click Add. The Log Source Wizard appears. 7. Select Pre−Configured Log Source and click Next. 8. In the Available Log Sources area, select a log source that contains your Overture spending data, move it to the Log Sources to Process area, and click Finish. 9. On the Log Sources tab, click Update.

Chapter 6: Campaign Tracking Module

127

Adding Cost and Impression Data

The Urchin Campaign Tracking Module (beginning with version 5.6) allows you to add fixed advertising costs and impression data to campaigns. If, for example, you have a cost associated with search engine optimization, website development, or an email campaign, you can enter this cost and see it reflected in Urchin reports, including campaign ROI calculations. The cost and impression data you enter is aggregated for the date range you specify when viewing reports. For example, if you enter 10,000 impressions for a campaign for January 1 and 5,000 impressions for February 1, Urchin will report 15,000 impressions for the reporting date range of January through February. You may also enter negative numbers, thereby adjusting cost and impression data. Using the same example, if you enter 10,000 impressions for January 1 and −5000 impressions for February 1, Urchin will report 5,000 impressions for January through February. How To Add Cost and Impression Data 1. In the Admin interface, click Configuration. 2. Edit the profile to which you wish to add data. 3. In the Storage/DB tab, click the Add Cost Data button. The Add CTM Entry Wizard appears. 4. Enter the date as of which the cost and/or number of impressions should apply. 5. Enter the CTM variable(s) that describe the campaign for which you are entering data. For example, to apply the data towards all organic Google referrals, specify the Source as google and the Medium as organic. To apply the data towards the summer newsletter (and assuming that you tag summer newsletter referrals with a utm_source=summer_news), specify the Source as summer_news. 6. Enter the cost amount and/or number of impressions that you want to associate with this campaign. 7. Click Add to Next Run. Urchin adds the cost/impression data to the Urchin database the next time that this profile is run. Example: How To Add Non−Search Engine Specific SEO Costs If you wish to enter a cost that applies to all organic search (i.e. the cost is not specific to Google or Yahoo, etc), enter a "−" for Source and "organic" for Medium, as shown below.

Chapter 6: Campaign Tracking Module

128

How To Analyze Keyword Buying

How does keyword buying analysis help me? Which keywords should I invest in? How much should I bid for a keyword? How much do I make on keywords? At which times of the day should I maximize my search engine exposure? How can I identify click fraud? You can answer these and other questions by analyzing your keyword buying with the Urchin Campaign Tracking Module. This article provides a walkthrough of each step, from collecting the data to analyzing the reports. What are the steps to analyze keyword buying? ♦ License and Install Urchin You will need to purchase and install the Urchin base product and the Campaign Tracking Module. If you need keyword ROI metrics, you should license the Profit Suite, which includes the Urchin base product, the Campaign Tracking Module, and the E−Commerce Module. Read Step 2:Install and License Campaign Tracking for more information. ♦ Define a Conversion Goal Read Step 3:Define a Conversion Goal to learn how to specify a goal for your site. ♦ Purchase Your Keywords Chapter 6: Campaign Tracking Module

129

For each purchased keyword from a pay−per−click search engine (such as Google or Overture), you will need to set up a referral link to your site and embed UTM variables. This article describes the best way to use the UTM variables for keyword buying analysis, below. ♦ Track Campaign Data You will need to install the UTM, enable cookie logging, and embed UTM variables in your referral links. The article Step 1:Track Campaign Data contains information on how to do all three of these things, and additional information on how to use the UTM variables for keyword buying analysis is provided in this article, below. ♦ Import Keyword Spending Data(only necessary for ROI reporting) Read Import Cost Data from Google and or Import Cost Data from Overture. ♦ Import E−commerce Data (only necessary for e−commerce ROI analysis reports) ♦ Optimize Your Keyword Buys Information on how to use the keyword reports to optimize your keyword buying is provided in this article, below. Which UTM variables should I use for keyword analysis? ♦ If you are using UTM−4 (Urchin 5.6) and later versions For paid search engine links, such as Google AdWords, use utm_source, utm_medium, and utm_campaign. If you are using broad matching, and you wish to see metrics for the broad matched keyword (rather than letting UTM detect the specific keyword), you may wish to use utm_term. The following link is an example of how you would use the UTM variables in a Google AdWords link. It indicates that the referral came from a paid Google search term and that the medium was cost−per−click. It also indicates that the visitor clicked on your adidas promotion link. The UTM−4 automatically detects the keywords used to find your site. Example http://www.mycompany.com/?utm_source=googletm_campaign=adidas Urchin provides a URL builder tool that creates links for you, embedding the campaign information that you specify. Using this tool ensures that your links contain the correct syntax. ♦ If you are using UTM−3 (Urchin 5.5 and 5.501) For paid search engine links, such as Google AdWords, use utm_source, utm_medium, utm_term, and utm_campaign. The following link is an example of how you would use the UTM−3 variables in a Google AdWords link. It indicates that the referral came from a paid Google search term, that the medium was cost−per−click, and that the visitor had searched on the keywords "running shoes". It also indicates that the visitor clicked on your adidas promotion link. Example http:// www.mycompany.com/? utm_source=googletm_term=running+shoesdas

Chapter 6: Campaign Tracking Module

130

Urchin provides a URL builder tool that creates links for you, embedding the campaign information that you specify. Using this tool ensures that your links contain the correct syntax. What about un−sponsored links? (aka unpaid, free, or organic listings in search engines) You only embed variables in sponsored links – links for which you paid on a search engine, or links over which you otherwise have control, such as links in an email that you send to customers. You don’t have to worry about un−sponsored links because Urchin Campaign Tracking automatically determines which search engine the referral came from and which keywords the visitor used. Be consistent with UTM variables. It is important that you use consistent names and spellings for all of your campaign variable values. For example, choose a code or name that indicates “cost−per−click” and use it consistently. To Urchin Campaign Tracking, utm_medium=cpc and utm_medium=cost_per_click are different mediums. Beginning with Urchin 5.6, a master tracking code feature is available that significantly reduces the possibility of consistency errors. Read How To Use Master Tracking Codes.

Optimize your keyword buys. ♦ Which keywords should I buy? ♦ How much should I pay for a keyword? ♦ How much do I make on a keyword? ♦ At which times of the day should I maximize keyword exposure? ♦ How can I identify click−fraud? Which keywords should I buy? You should buy keywords that return the highest number of transactions and/or goals, or yield the highest revenue. Begin by looking at the Keyword Comparison−−>Conversion report. Which keywords deliver the highest goal conversion and/or sales conversion rates? In this example, the highest goal conversion and sales conversion rates (1.8% and 1.75%, respectively) come the third item. (Note: Sales conversion rate appears only if you have licensed the E−commerce module.)

Chapter 6: Campaign Tracking Module

131

Next, look at the Keyword Analysis−−>Conversion report and drill down to see keyword by keyword detail inside of each of your organic search engines. Keywords that deliver high conversion rates on organic search engines are often good keywords to buy. If you have licensed the E−commerce module, look at the Keyword Comparison−−>ROI report and the Keyword Analysis−−>ROI report. Which keywords perform the best on each search engine? Again, often organic search engine results can give a good indication of how a keyword will perform as a sponsored link on a particular search engine. How much should I pay for a keyword? To answer this question, you will need to have licensed the E−commerce module. Look at the Keyword Comparison−−>ROI report and drill down on the keyword you are analyzing. The Avg. Value metric will tell you the average value per click, or total revenue divided by clicks. This is the maximum amount you should bid on the keyword. Note that the average value does not take into account production costs or other business expenses. In the example below, the keyword "analytics system architecture" on the "google [cpc]" (Google cost−per−click) search engine is yielding an average value of 33 cents.

How much do I make on a keyword? If you have licensed the E−commerce module, look at the Keyword Comparison−−>ROI report or the Keyword Analysis−−>ROI report. Both of these reports show your Return on Investment for each keyword on each search engine, for all keywords across a single search engine, and for all search engines across a single keyword. In the example below, the cost−per−click ROI for "analytics system architecture" on Google is 763%.

At which times of the day should I maximize a keyword's exposure? Look at the Day Parts Breakdown−−>Goal Conversion by Hour or Sales Conversion by Hour. Drill Chapter 6: Campaign Tracking Module

132

down on the keyword you are analyzing. The report will display the number of goals or transactions and a conversion rate by hour of the day. The timezone is controlled by your administrator using the Time Offset field on the Reporting tab of Configuration−−>Urchin Profiles−−>Profiles−−>Edit. By default, this is set to "Local Time". How can I identify click−fraud? Look at the Click Fraud Watch−−>Repeat Clicks by IP report. Drill down on any IP−Visitor ID to view search engines. If any cost−per−click search engines appear with Repeat Clicks, click on those search engines to determine the keyword(s) on which Repeat Clicks occurred. You can ignore any Repeat Clicks on organic search engines since these clicks are harmless. Note that Repeat Clicks do not necessarily indicate hostile activity; Repeat Clicks often occur naturally as result of visitors going back and forth between the referral and your site. Look for a high number of Repeat Clicks (10 or more) per day over a period of several days on specific paid keywords. You can click on an IP−Visitor ID in the Repeat Clicks by IP report to obtain information on the click originator. Note that this data may contain information on the ISP and not the individual visitor. For additional information on click−fraud, visit Alchemist Media, Inc. Click Fraud Guidelines or contact Jessie Stricchiola ([email protected]). Acquisition report. In the Filter field, type "referral|none|organic" and click the minus button. The report now displays only referrals that were tagged with UTM variables. For example, in the following report, the report displays referrals from three source[medium] combinations.

Drill down on the source[medium] for the content you would like to examine. The report now displays the different versions of content within that source [medium]. The CTR (Click Through Rate) tells you the percentage of impressions (ad displays) that resulted in clicks. The %New field tells you the percentage of clicks that are new leads. Which versions of my advertisements refer the visitors most interested in my site? Look at the Content (A/B) Testing−−>Quality report. In the Filter field, type "referral|none|organic" and click the minus button. The report now displays only referrals that were tagged with UTM variables. Drill down on a source[medium] for the content you would like to examine. The report now displays the different versions of content within that source [medium]. Depth is a measure of interest which Chapter 6: Campaign Tracking Module

133

tells you the average number of pages on your site that each visitor viewed. Loyalty indicates the average number of times visitors returned to your site. Which versions of my advertisements refer the visitors most likely to reach a conversion goal on my site? Look at the Content (A/B) Testing−−>Conversion report. In the Filter field, type "referral|none|organic" and click the minus button. The report now displays only referrals that were tagged with UTM variables. Drill down on a source[medium] for the content you would like to examine. The report now displays the different versions of content within that source [medium]. Goal Conv. (Goal Conversion Rate) is the percentage of referrals that reached a goal on your site. If you have licensed the the E−commerce Module, the percentage of referrals that made a purchase on your site, Sales Conv. (Sales Conversion Rate), is displayed. Which versions of my advertisements provide the biggest return? If you have licensed the E−commerce module, you will be able to see the revenue associated with each version of content. Look at the Content (A/B) Testing−−>ROI report. In the Filter field, type "referral|none|organic" and click the minus button. The report now displays only referrals that were tagged with UTM variables. Drill down on a source[medium] for the content you would like to examine. The report now displays the different versions of content within that source [medium]. Revenue is the gross revenue associated with referrals from the content within the source[medium]. For Content Analysis, the Cost column will always be 0 and the ROI column will be equal to Revenue.

How To Track Content−Targeted Ads

If you currently purchase keywords on Google or Overture, you should also consider participating in the Google and Overture content−targeted advertising programs. These programs place your cost−per−click search ads on content sites that are published by Google and Overture partners. For example, if you sell vacation packages to France, your ad might appear in an article on Parisian restaurants. Participation in the Google and Overture programs is free, and you pay the same cost−per−click that you pay for search engine referrals. To track your content−targeted ad referrals, you will need to: ♦ sign up for content−targeted advertising on your Google and/or Overture cost− per−click account ♦ edit your links to track content−targeted ad referrals Once you have signed up for content−targeted advertising, each of your cost−per−click ads will have Chapter 6: Campaign Tracking Module

134

two links associated with it − one link for search referrals and one link for content−targeted referrals. You will need to edit the link used for content−targeted referrals so that you can track search engine referrals and content−targeted referrals separately. Which UTM variables should I use to track content−targeted ad referrals? For content−targeted ad referrals, you should use utm_source, utm_medium, utm_content. ♦ Use utm_source to indicate the search engine. ♦ Use utm_medium to indicate a cost−per−click content−targeted ad. For example, you might use "utm_medium=cpc−content" to differentiate from your search referrals which say "utm_medium=cpc". ♦ Use utm_content to specify which specific ad referred the visitor. ♦ If you have multiple types of products or multiple campaigns, you should also use utm_campaign. For example, if you have a “spring sale” campaign and an “adidas” promotion, you should indicate the appropriate campaign in your link. The following link illustrates how you would use the UTM variables for a content−targeted ad referral: Example http:// www.mycompany.com/buy_page?utm_source=googleontent Urchin provides a URL builder tool that creates links for you, embedding the campaign information that you specify. Using this tool ensures that your links contain the correct syntax. To sign up for content−targeted ad placements on Google 1. Log in to your AdWords account. 2. In the Campaign Summary table, click the appropriate ad campaign. 3. Click Edit Campaign Settings above the Ad Groups table. 4. At the bottom of the Edit Campaign Settings table, locate the distribution preferences checkboxes, and: 5. Click the checkbox next to content sites in Google’s network to check this option. Your ad will be included on additional content sites in the expanded network (iIf you click again to remove the check, your ad will not be included on these sites). 6. Click Save All Changes at the bottom of the page to finish. To edit your links to track content targeted referrals from Google 1. Log in to your Adwords account. 2. Navigate to your campaign and Ad Group 3. Scroll to the bottom of your keyword list to edit your Ad(s) 4. Click Edit on one of your ads (Or, click Create New Ad) 5. Create a link according to the guidelines described above, in the section "Which UTM variables should I use to track content−targeted ad referrals?" To sign up for content−targeted ad placements on Overture 1. Log in to your Overture account. 2. Click the Account Set−Up link on the Account tab. Chapter 6: Campaign Tracking Module

135

3. Under Content Match Advertising, select On and click Submit. To edit your links to track content targeted referrals from Overture 1. Log in to your Overture account. 2. Click the Manage Products tab. "Pay−For_Performance Search | Content Match" displays on the top margin of the page. 3. Click Content Match and select Manage Listings from the drop down menu. 4. Click next to the search term you wish to edit and press Edit Listings. 5. Click Modify Listings in the pop up. 6. Create a link according to the guidelines described above, in the section "Which UTM variables should I use to track content−targeted ad referrals?"

How To Track Email Campaigns

The Urchin Campaign Tracking Module (beginning with version 5.6 and UTM−4) allows you to track email campaign impressions, clickthroughs, and conversions. An email impression is registered when the email recipient opens the email message. A clickthrough is registered when the recipient clicks on a link inside the email message. A conversion is registered when the recipient reaches a goal page on your site or completes a purchase. This article describes how to: ◊ create the email message, and ◊ interpret the email campaign results Creating the Email Message You will need to create your email message as described in this section, so that the Urchin Campaign Tracking Module can accurately track impressions (opened emails) and referrals. 1. To track the email impressions, embed the __utm.gif image anywhere in the message as illustrated below. Example 1 (Tracking the email impressions using a master tracking code) "get_parameter" allows you to retrieve the value for a specific directive from the record matching ident in the configuration database, and requires the "parameter=parameter" argument. Modifying Data: the "add", "edit" and "delete" functions provide comprehensive editing ability of records in the Urchin configuration database. A directive list should also be passed along with these actions. The "set_parameter" function sets a particular directive within a record. • "add" requires both the "table=tablename" and "name=recordname" parameters and inserts a new record into the database with the specified set of "directive=value pairs. • "edit" is similar to the "add" command, except the record directives for the specified id are replaced with the new list of "directive=value pairs. This function clears all previous directives for id and adds all new directives specified on the command line. • "delete" deletes the record matching ident from the table. • "set_parameter" sets the specified "directive=value" directive in the configuration for the record matching ident. Diagnostics Returned and Exit status Upon completion of a command, uconf−driver will print out one of the following diagnostics based on the action parameter that was specified: Action (any)

Output Description [usage msg] command line not in recognized format command didn't perform any action, perhaps due to an out of range entry, (any) [no msg] incorrect table name, etc. (any) −l command line parameters invalid for request type add [recnum] record number of record that was created delete [recnum] record number of record that was deleted edit [recnum] record number of record that was edited get [record] complete set of name/value directives for specified record get_paramenter [param] value of requested directive list [records] complete list of all name/value directives for all records in the specified table nrecords [count] count of records in the specified table ntotalsrecords [count] count of all records in all tables set_parameter 0 always prints 0 on success

Please note that you must parse the runtime output from uconf−driver to determine if the command was successful. At present, the utility always exits with a status code of 0, so the exit status cannot be used to determine if the command succeeded or not. Examples

Chapter 7: Advanced Topics

154

Here is a set of example commands using the uconf−driver utility. Please note that all commands are on a single line; line breaks have been added for readability. # Extract the total number of records from the Urchin configuration database uconf−driver action=ntotalrecords # Extract the number of records in the "profile" table uconf−driver action=nrecords table="profile" # List records 6−8 in the "logfile" table uconf−driver action=list table="logfile" start=6 n=3 # Extract the record for user "joe" from the "user" table uconf−driver action=get table=user name="joe" # Add a regular non−privileged user to the configuration uconf−driver action=add table=user name="bob" ct_fullname="Bob Jones" ct_password="b0bz@pw" cs_adminlevel=3 # Set the default language for the Urchin reports to English uconf−driver action=set_parameter table=user name="bob" cs_language=en # Change the network port that the Urchin webserver uses uconf−driver action=set_parameter recnum=1 ct_port=1234

Sample Bourne Shell script to add a Profile/Task/LogSource/User using uconf−driver This is a sample working script that demonstrates how uconf−driver could be embedded in a script to automate the creation of an entire new Profile, a Log Source for it to process, a scheduled Task, and a User with rights to view the Profile. #!/bin/sh # # Proof−of−concept Bourne shell script for adding a Profile, # Task, Log Source and User record set to the Urchin configuration. # The Profile will be set to run at 01:05am daily. # # NOTE: Line breaks have been added for readability # # Define the pertinent information here. Obviously, this should really be # stuff that's parsed from the command line. domain=mysite.com logfile=/path/to/webserverlogs/mysite−access.log username=userjoe password=joepasswd language=en region=us cd /path/to/urchin/util # Add Profile p_recnum=`./uconf−driver action=add table=profile name=$domain \ ct_name=$domain ct_website=http://www.$domain \ ct_reportdomains=$domain,www.$domain` # Add Task t_recnum=`./uconf−driver action=add table=task name=$domain \

Chapter 7: Advanced Topics

155

ct_name=$domain cr_frequency=5 cr_enabled=on cs_hour=01 \ cs_minute=05 cs_rid=$p_recnum` # Set proper cross reference from Profile to Task recnum=`./uconf−driver action=set_parameter recnum=$p_recnum cs_taskid=$t_recnum` if [ "$recnum" != "$p_recnum" ]; then echo "Failed to associate profile with task" fi # Add Log Source l_recnum=`./uconf−driver action=add table=logfile name=$domain \ cr_action=2 ct_name=$domain cr_type=local ct_loglocation=$logfile \ cs_logformat=auto cs_rlist=!$p_recnum!` # Set proper cross reference from Profile to Log Source recnum=`./uconf−driver action=set_parameter recnum=$p_recnum cs_llist=!$l_recnum!` if [ "$recnum" != "$p_recnum" ]; then echo "Failed to associate profile with log source" fi # Add regular non−privileged user with access to this Profile u_recnum=`./uconf−driver action=add table=user name=$username \ ct_name=$username ct_password=$password ct_fullname="$domain user" \ cs_language=$language cs_region=$region cs_adminlevel=3 \ cs_rlist=!$p_recnum!` # Set proper cross reference from Profile to User recnum=`./uconf−driver action=set_parameter recnum=$p_recnum cs_ulist=!$u_recnum!` if [ "$recnum" != "$p_recnum" ]; then echo "Failed to associate user with profile" fi exit ## ## END OF SCRIPT ##

Here is the same script, written using DOS commands. @echo off REM Proof−of−concept Windows batch file for adding a Profile, REM Task, Log Source and User record set to the Urchin configuration. REM The Profile will be set to run at 01:05am daily. REM REM This should work on Windows 2000, XP and 2003 Server. REM REM NOTE: Line breaks have been added for readability − you will need REM to ensure that all your commands appear on one line in the script REM REM REM Prompt the user for the information we need. This section could REM be replaced using values from command line arguments instead. REM set/p domain=Enter domain: set/p logfile=Enter webserver log pathname: set/p username=Enter username:

Chapter 7: Advanced Topics

156

set/p password=Enter password: set/p language=Enter language: set/p region=Enter region: cd c:\program files\urchin\util REM REM Add Profile REM uconf−driver action=add table=profile name=%domain% ct_name=%domain% \ ct_website=http://www.%domain% \ ct_reportdomains=%domain%,www.%domain% > #tmp set/p p_recnum= #tmp set/p t_recnum= #tmp set/p recnum= #tmp set/p l_recnum= #tmp set/p recnum= #tmp set/p u_recnum= #tmp set/p recnum= e) in same month force action to occur without confirmation print this help information go directly to rebuild−index option specifies name of profile (required) go directly to remove option quiet mode, suppress output except for critical user confirmation go directly to rebuild−header option go directly to zero−day option

Note: When udb−sanitizer is called with options that do not completely describe what action to take, it will prompt the user as needed for additional input. You can cause an action to be performed without any user interaction by using the "−d" option in conjunction with any of the −b,−i,−r,−x, or −z options. Operation In normal operation, udb−sanitizer is invoked from a command shell and interactively prompts the user for the actions to take. For each month of Urchin reporting data that the utility finds, it will present the following interactive menu: Options: 1. Rollback data to state before last run 2. Delete this month entirely 3. Rebuild header to match data 4. Rebuild indexes 5. Zero out one or more days Please choose 1−5 or press return to do nothing:

If no action for the currently selected month is desired, pressing the Enter/Return key will cause the utility to move forward to the next chronological month where data is present and present the same menu choices. Actions associated with the options presented above are: 1. Data rollback The utility will revert all reporting data for a profile to that contained in a ZIP archive. The user is presented with a list of ZIP archive backups to choose from. The ZIP archives are named with the following convention "YYYYMM−backup−YYYYMMDDHHMMSS.zip", where the first YYYYMM refers to the month of data being backed up (e.g.200309 refers to September 2003), and Chapter 7: Advanced Topics

169

the YYYYMMDDHHMMSS portion is the timestamp of when the ZIP archive was created. This timestamp should be helpful in determining which ZIP archive you want to roll back to. Please note that there is no way to invoke udb−sanitizer to do a rollback based solely on command line arguments; it will always prompt for the ZIP archive to rollback to if any exist. If no ZIP backup archives exist, the utility prints a diagnostic to that effect and exits. 2. Delete monthly data All data for a particular profile for the specified month is removed. This option is useful for zeroing out the statistics for a month if the data is incorrect, e.g. the wrong filters were applied or the wrong logs were processed; or perhaps some of the advanced profile parameters were changed such as the click path depth or referral level and it is desirable to update that month's Urchin reporting data to reflect the change. This action can be performed without user interaction by invoking udb−sanitizer with the "−f", "−r" and "−d" arguments, e.g. udb−sanitizer −f −r −d 200309 −p mysite.com

3. Rebuild database headers This causes the utility to read the Urchin database tables directly and rebuild the database headers based on the data found. This should only be done if udb−sanitizer finds a discrepancy between the headers and the data. WARNING: if the database headers do not match the data, this is typically indicative of some type of database corruption; in this case, the prudent course of action is to completely remove the data for that month and reprocess the corresponding webserver logs. This may not be possible for various reasons, so rebuilding the headers may be the only way to resuscitate the databases so that the Urchin log processing and reporting engines are able to work with them, but this is not guaranteed to fix corruption. This action can be performed without user interaction by invoking udb−sanitizer with the "−f", "−i" and "−d" arguments, e.g. udb−sanitizer −f −x −d 200309 −p mysite.com

4. Rebuild database indexes This causes the utility to read the Urchin database tables directly and rebuild the database indexes based on the data found. This should only be done if udb−sanitizer finds a discrepancy between the headers and the data. NOTE: the same warning given about corruption in the database headers applies to this option as well. This action can be performed without user interaction by invoking udb−sanitizer with the "−f", "−i" and "−d" arguments, e.g. udb−sanitizer −f −i −d 200309 −p mysite.com

5. Zero data for one or more days This option allows data for selected days within the month to be zeroed out, thereby allowing Urchin log processing to be rerun for those days only (e.g. urchin −p profile −d YYYYMMDD). This action can be performed without user interaction to zero out a single day by invoking udb−sanitizer with the "−f", "−z" and "−d" arguments, e.g. udb−sanitizer −f −z −d 20030907 −p mysite.com

and for multiple days by including the "−e" argument as well to specify an end date, e.g. udb−sanitizer −f −z −d 20030907 −e 10 −p mysite.com

which will zero out data for September 7th through the 10th. This is more efficient than invoking multiple instances of udb−sanitizer to zero out a single day at a time, as the database indexes and headers only are checked once. The index/header checking operation can require a noticeable amount of time on profiles with a lot of data. Considerations

Chapter 7: Advanced Topics

170

• Invoking udb−sanitizer without specific dates on profiles with a lot of historical data can be time consuming, as the utility must open up the databases for each month, perform sanity checks, and then present the menu of actions. • Actions that delete daily or monthly data cannot be undone! The only recourse is to reprocess the webserver logs for that time period to repopulate the profile databases. Use these options with care.

urchinctl: Urchin Services Control Utility

Overview The Urchin Services Control utility, urchinctl, provides a means of starting and stopping the Urchin Scheduler and Urchin Webserver services. On UNIX−type systems, urchinctl is typically called from one of the system's boot−time scripts to automatically start up or shut down Urchin services. The types of operations that urchinctl can perform are: • Start, stop, or restart the scheduler or webserver (or both) • Start the webserver on an alternate port • Start the webserver with SSL encryption Usage urchinctl is located in the bin directory of the Urchin distribution. Usage of the utility is as follows: urchinctl [−h] (prints usage message and exits) urchinctl [−v] (prints version and exits) urchinctl [−e] [−p port] [−s | −w] action

where: −e −p −s −w

activates encryption (SSL) specifies the port for the performs the action on the performs the action on the

in the webserver webserver to listen on Urchin scheduler ONLY Urchin webserver ONLY

and action is one of: start stop restart status

(starts the service(s)) (stops the service(s)) (stops and then starts the service(s)) (displays webserver/scheduler runtime status)

Chapter 7: Advanced Topics

171

By default, the action is performed on both the webserver and the scheduler unless the "−s" or "−w" command line arguments are specified. Note that these arguments are mutually exclusive. Considerations • On UNIX−type systems, urchinctl should be run as the user/UID that Urchin is installed as to ensure that the urchinwebd and urchind processes are started as that UID. • Starting up the Urchin webserver with SSL encryption initially requires additional configuration steps. Please see the document titled Activating SSL on the Urchin Webserver in the Security Features section of the Advanced Topics area of the Urchin Documentation Library.

urchin: Urchin Log Processing Engine

Overview The Urchin Log Processing Engine, urchin, is the core log processing component of Urchin. Ordinarily, the log processing engine is invoked from the Urchin Scheduler (urchind) when a task is run. However, it is possible to execute urchin directly from a command shell to run a specific profile. This is useful in highly scripted environments where running a Urchin tasks from an external source such as the Windows Task Scheduler or cron on UNIX−type systems. It is also useful for running a profile under special circumstances, such as to process only hits for a particular day, or to do some type of debugging. urchin is not truly a utility − it is documented here because it possesses some limited command−line capabilities that may prove useful in certain environments. Usage urchin is located in the bin directory of the Urchin distribution. Usage of the utility is as follows: urchin [−h] (prints usage message and exits) urchin [−v] (prints version and exits) urchin [−DHt [−d YYYYMMDD] −p profile

where: −D −H −t −d −p

runs urchin in debug mode causes the runtime output of urchin to be logged to the standard history directory for the profile runs urchin in test mode only to output runtime parameters specifies that Urchin should only process hits from logs that match the specified date (YYYYMMDD format) specifies the profile to run

Chapter 7: Advanced Topics

172

Considerations ♦ On UNIX−type systems, urchin should be run as the user/UID that Urchin is installed as to ensure that the databases for the profile are owned by that UID, since urchin will create them if they do not already exist.

Integration NFS locking requirement

Urchin configuration, which uses data files located in the 'data/conf' folder of the installation, will set read and write locks during setup and administration. If the data/conf folder is mounted over NFS, it is required that the NFS server is running the appropriate locking daemons to handle remote locking. If the locking does not work properly, the Urchin installation may hang indefinitely. Technical Notes: Urchin uses the fcntl() function on UNIX to perform advisory and exclusive (read/write) locking. Some older systems may not support all of the rpc locking requests of different platforms. Be sure you are running recent versions of your OS with the latest patches. Make sure that the NFS server is running 'statd' and 'lockd' or equivalent.

Overview of Urchin Integration Capabilities

Introduction The Urchin 5 front end is built on modular components and industry standard techniques to allow administrators to integrate Urchin report access into their existing or planned infrastructures. As part of our commitment to hosting providers and data centers, we test and support three integration points and six different functions which should allow Urchin to integrate into most architectures. The following diagram illustrates the primary components of the Urchin front−end and the three integration points. Both administrative and reporting functions are web−based and delivered from this system.

Chapter 7: Advanced Topics

173

All content from the Urchin system is delivered via an Apache server that is installed as "urchinwebd" shown in the left side of the diagram. Requests are handled by the Apache server and passed using the CGI interface to a Session Controller application (session.cgi). The Session Controller performs Authentication and depending on the action passes the request on to either the Admin Engine (admin) or the Urchin Report Engine (urchin.cgi). Content with embedded session identifiers is passed back to the user. The three points labeled "A", "B", and "C" are the three integration points mentioned previously. Point "A" allows the administrator to either replace or bypass the Urchin web server. Point "B" allows for external or no authentication. And point "C" allows for direct access to the reporting via a wrapper or portal. Point A: Web Server Integration Many hosting companies will already have a running web server or have a specially compiled web server that runs within their systems. As long as a web server is providing basic CGI operations, you may replace the Apache binary or use an existing web server to provide access to Urchin. This integration point keeps the rest of the Urchin system intact. The Urchin interface will be used for administration, authentication, and report delivery. Users and Profiles will need to be configured within the Urchin system so as a user enters the system, Urchin will know which reports to allow them access to. For complete details on the requirements of replaced binary or an example of using an existing web server, please see the appropriate document for the type of webserver you are running under the Integration section of the Advanced topics area of the Urchin Documentation Center at http://help.urchin.com. Point B: External Authentication or Authentication Bypass Complex hosting environments will often have existing centralized controls for providing user authentication such as LDAP. With a simple configuration change, the Session Controller can call an authentication routine of your choice. A simple interface is provided for returning successful or failed logins. Existing authentication routines can be easily wrapped to provide the correct framework for both systems. Using integration at point "B", it also possible to bypass the authentication with a dummy routine and link directly to the landing page. Chapter 7: Advanced Topics

174

By using integration at point "B", the overall Urchin system remains intact with the exception that authentication is performed by and external application. Users and Profiles will need to be configured within the Urchin system so as a user enters the system, Urchin will know which reports to allow them access to. For complete details on the interface for using external authentication and an example of how to bypass authentication, please see the associated Integration document entitled Using External Authentication or Authentication Bypass. Point C: Link Directly to Reports from Wrapper or Portal Many of our customers have a one−to−one relation between a customer and a website, and wish to link to reporting for a particular website directly from the website's administration area. This area is typically already authenticated. Point C integration makes it easy to link directly to reporting either via a wrapper script or from an existing portal. In this scenario, the Urchin authentication and initial report selection screen are bypassed as users are taken directly to the Urchin reports. By using integration at point "C", it is assumed that the service provider has control over who gets to view what, either by a one−to−one relation or by an existing configuration database. Urchin will still need to be configured for generating Profile reports, but this can be automated within or outside of the administrative interface. Urchin provides the capability to link directly to an Urchin report with a specific URL when the "Direct Report Linking" is enabled in the administration interface. Access to the Urchin report is controlled via a ".report.conf" file embedded in the directory specified by this URL. For a complex portal integration, the Urchin reporting engine will propagate session and other portal variables in order to keep the session operating. For details on how to use a wrapper or portal to access Urchin reporting directly, and proper configuration of the ".report.conf" please see the Integration document entitled Linking Directly to Urchin Reports.

Changing the Location of the Urchin Data Directory

Overview For simplicity, the default Urchin 5 installation is contained in a single umbrella directory that contains all Urchin applications and utilities, library files, documentation, etc. All data processed by Urchin is also stored in this installation directory under the data sub−directory. In many cases, it is desirable to configure Urchin to store its data on another disk partition or drive that is better suited to dynamic data and allows for greater storage capacity. Urchin can easily be configured to do this. Procedure The location of where Urchin stores its report data can be changed in the urchin.conf file, which is located in the etc directory of the main Urchin installation directory. Chapter 7: Advanced Topics

175

For UNIX−type systems: 1. Open a command shell as the user that Urchin was installed as 2. cd to the directory where Urchin is installed 3. Stop the Urchin services with the command: bin/urchinctl stop 4. Using a text editor, open the urchin.conf file in the etc directory of the Urchin distribution 5. Uncomment the following line by removing the leading '#' character: #dataDirectory: ./data/ and substitute the full directory path you want to use instead of the default directory, e.g. dataDirectory: /bigdisk/urchin/data/ 6. If necessary, create the new data directory. Important note: this directory must be readable and writable by the user Urchin runs as. 7. Copy the existing data to the new location. To ensure that all permissions and data are properly preserved, it is recommended that you use the following command: cd data; tar cf − . | (cd /bigdisk/urchin/data; tar xpf −) 8. Rename the data directory to data.old. You can remove it completely if you wish, but you may want to ensure that everything is working properly before doing this. 9. For ease of administration, you may want to create a symlink in the main Urchin directory pointing at this new location, e.g. ln −s /bigdisk/urchin/data ./data 10. Restart the Urchin services with the command: bin/urchinctl start 11. Log in to Urchin as the admin user. You should be presented with the License Urchin screen. Simply click on the Reactivate License link and you are finished. For Windows systems: 1. Stop the Urchin Services: Start−>Programs−>Urchin−>Disable Urchin Services 2. Open Windows Explorer and navigate to the etc folder of the Urchin distribution. By default, this is C:\Program Files\Urchin\etc 3. Using a text editor, open the urchin.conf file 4. Uncomment the following line by removing the leading '#' character: #dataDirectory: ./data/ and substitute the drive letter and full pathname you want to use instead of the default folder, e.g. dataDirectory: E:\Urchin\data 5. If necessary, create the new data folder. Important note: the permissions on this folder must allow read and write access to the Urchin service. 6. Copy the contents of the data folder to the new folder. 7. Rename the data folder to data.old. You can remove it completely if you wish, but you may want to ensure that everything is working properly before doing this. 8. Restart the Urchin Services: Start−>Programs−>Urchin−>Enable Urchin Services Considerations Due to the way licensing is implemented on UNIX−type systems to prevent tampering, moving Urchin's data directory will require the Urchin license to be reactivated. However, the Chapter 7: Advanced Topics

176

relicense operation is extremely simple − you merely need to log in as the Urchin admin user and click Reactivate License.

Using an Existing Apache Webserver (UNIX−type Platforms)

By default, Urchin 5 administration and reporting are done using a standalone Apache server that is bundled with the Urchin product. In the vast majority of Urchin installations, this is the preferred method for delivering Urchin admin and reporting interfaces. However, in rare instances it may be necessary to utilize an existing Apache installation. This may be due to site requirements that a localized version of Apache be used throughout the organization, or that all web services be controlled via a single Apache configuration. The information below describes two different models that can be employed to meet these requirements. DISCLAIMER: These modifications to the Urchin installation are unsupported and fall outside the scope of the standard Urchin free and paid support plans. Any assistance rendered to set up or debug these configurations will be done at Urchin Software Corporation's standard Hourly Support rate. Option 1: Utilitizing an existing site−specific Apache httpd binary to run Urchin services as a separate instance Side effects: ⋅ Urchin upgrades that depend on features added and/or configuration changes to the bundled Apache may not work properly if the existing Apache binary doesn't support these changes/features. Configuration changes ⋅ Ensure that your httpd includes support for the following modules: mod_access mod_cgi mod_dir mod_mime ⋅ Install Urchin 5 in the normal fashion, choosing the desired port for Urchin's admin and reporting interfaces to run on. ⋅ Once Urchin is installed, do the following: cd /path/to/urchin/bin ./urchinctl stop mv urchinwebd urchinwebd.orig ln −s /path/to/your/httpd urchinwebd ./urchinctl start

This will start up a separate instance of Apache that uses your Apache binary, but runs independently from your standard web services. Option 2: Running Urchin services from an existing Apache configuration

Chapter 7: Advanced Topics

177

Side effects: ⋅ The entire Urchin distribution must be owned by the same UID that httpd runs as. ⋅ The admin GUI cannot be used to change the port Urchin runs on. ⋅ Use the bin/urchinctl command with the −s argument exclusively to start/stop only the Urchin scheduler (urchind) Configuration changes Add the following lines to your existing httpd.conf file. You will need to supply the IP address for your server, the port number for the Urchin to use, as well as the path to the location where you've installed Urchin 5.

## Support for Urchin administration and reporting services Listen [port#] ErrorLog /path/to/urchin−distribution/var/error.log DocumentRoot /path/to/urchin−distribution/htdocs/ AddHandler cgi−script .cgi Options ExecCGI DirectoryIndex session.cgi AllowOverride None Order allow,deny Allow from all

Once these configuration changes have been made, perform the following tasks: ◊ Change the ownership of the Urchin distribution to the UID that your Apache webserver runs as: chown −R apache−user /path/to/urchin/bin

◊ Ensure that the Apache bundled with Urchin is stopped: cd /path/to/urchin/bin ./urchinctl −w stop

◊ Restart your Apache server to enable Urchin reporting and administration ◊ Edit your Urchin boot−time startup script(s) and replace any instances of urchinctl start

with urchinctl −s start

This will cause only the Urchin scheduler to be run at boot time, rather than both the scheduler and Apache server.

Chapter 7: Advanced Topics

178

Using an Existing IIS Webserver (Windows Platforms)

By default, Urchin 5 administration and reporting are done using a standalone Apache server that is bundled with the Urchin product. In the vast majority of Urchin installations, this is the preferred method for delivering Urchin admin and reporting interfaces. However, in rare instances it may be necessary to utilize an existing IIS webserver. This may be due to site requirements that disallow the use of a third party webserver product on the server, or the need to set up Urchin reporting as a virtual host on an existing IIS server. DISCLAIMER: These modifications to the Urchin installation are unsupported and fall outside the scope of the standard Urchin free and paid support plans. Any assistance rendered to set up or debug these configurations will be done at Urchin Software Corporation's standard Hourly Support rate. Procedure Note: this procedure assumes that Urchin has been installed in the default location of C:\Program Files\Urchin. If you have installed Urchin elsewhere, please be sure to substitute the proper location in the example below. Step 1: Create a new user for the Urchin web interface 1. Go to "Administrative Tools" −> "Computer Management" 2. On the left hand side of the Computer Management screen, click "Local Users and Groups" 3. Right−click on the Users folder and select "New User..." 4. Enter "IUSR_URCHIN" in the "User name:" field 5. Uncheck the "User must change password at next logon" box 6. Check the "User cannot change password" box 7. Click "Create" and then "Close" 8. Double click on the"Users" folder on the left 9. Right−click on the IUSR_URCHIN on the right and select "Properties..." 10. Under the "Member of" tab, remove all existing entries, and then click the "Add.." button and choose "Guests" in the popup window, and click "Add" again, then click "OK" 11. Click "Apply" and "Close" to save your changes Step 2: Install Urchin (if not already done) Step 3: Disable the Urchin Apache web server 1. Go to "Administrative Tools" −> "Services" 2. Under "Services" find the "Urchin Webserver" record 3. Right click on "Urchin Webserver" and select "Stop" 4. Right click on "Urchin Webserver" and select "Properties", then change the Startup type" to "Disabled" 5. Click "OK" Chapter 7: Advanced Topics

179

Step 4: Added a new web site to IIS 1. Go to "Administrative Tools" −> "Internet Services Manager" 2. Right−click on your server's name and select "New" and then "Web Site" 3. In the "Description:" field type "Urchin" and click Next 4. Select the IP address and port number (typically 9999) and click Next 5. In the "Path:" field browse to the location where Urchin is installed (typically C:\Program Files\Urchin\htdocs) and click Next 6. Add a check mark in "Execute:" and click Next and then Finish 7. Right−click on the new Urchin web site and go to "Properties" a. Under the "Web Site" tab, un−check "Enable Logging" b. Click on the "Home Directory" tab and check "Script source access" c. Click on the "Documents" tab and Remove both Default entries, then click Add and enter "session.cgi" in the popup window, then click OK. d. Click on the "Directory Security" tab, and then click Edit in the "Anonymous access and authentication control" area. Ensure that the "Anonymous access" box is checked, then click Edit... to change the "Account used for anonymous access". In the pop−up window, select "IUSR_URCHIN" for the Username. Click OK and then OK again to get back to the Properties window. e. Click OK to save your changes and exit the Properties window Step 5: Set up directory permissions 1. Right click on "Start" and select "Explore" 2. Navigate to the location where Urchin is installed (typically C:\Program Files\Urchin) 3. Right click on the "Urchin" folder and select "Properties" 4. Click on the "Security" tab 5. Un−check "Allow inheritable permissions from parent to propagate to this object" and then click "Remove" in the pop−up window. 6. Click "Add" and select the Administrator user and then click "Add" 7. Click "Add" and select IUSR_URCHIN and then click "Add" 8. In the "Name:" field ensure that only the Administrator and IUSR_URCHIN entries are there 9. Ensure that only the following permissions are allowed for both the Administrator and IUSR_URCHIN users: ◊ Read &Execute ◊ List Folder Contents ◊ Read 10. Click OK to save the permissions. 11. Click into the "Urchin" folder in the Windows Explorer window. 12. Right click on the "data" folder and select "Properties" 13. Click on the "Security" tab. 14. Un−check "Allow inheritable permissions from parent to propagate to this object" and then click "Remove" in the pop−up window. 15. Click "Add" and select the Administrator user and then click "Add" 16. Click "Add" and select IUSR_URCHIN and then click "Add" 17. In the "Name:" field ensure that only the Administrator and IUSR_URCHIN entries are there 18. Ensure that the following permissions are allowed for both the Administrator and IUSR_URCHIN users: ◊ Full Control ◊ Modify ◊ Read &Execute Chapter 7: Advanced Topics

180

◊ List Folder Contents ◊ Read ◊ Write 19. Click "OK" Step 6: IIS 6 only 1. Go to the "Web Service Extentions Manager" 2. Click on 'Add new web service extention..' 3. Enter "Urchin CGI" in the "Extention Name:" 4. Click 'Add' 5. Browse to and highlight report.cgi and session.cgi ◊ Default location: C:\Program Files\Urchin\htdocs 6. Check "Set Extention Status to Allow" 7. Click "OK" 8. Go to the main IIS entry for your server: "Hostname" 9. Right click and select "Properties" 10. Click on "Mime Types" 11. Click "New" 12. Enter ".cgi" in the "Extention:" field. 13. Enter "application/octet−stream" in the "Mime Type" field. 14. Click "OK" 15. Click "OK" Your IIS webserver should now be set up to call the Urchin web interface if you connect to it using the URL of http://my.server.com:port, where port is what you set Urchin to use when you installed it (default is 9999).

Using External Authentication or Authentication Bypass

Overview By default, Urchin authentication is performed when the Urchin Session Controller (session.cgi) calls the “auth” binary located in the “bin” directory of the Urchin Installation. This binary queries the configuration database and compares the username and password provided with that stored in the configuration. An exit code signifying either success or failure is returned to the Session Controller. The location of the authentication binary can be controlled with a configuration change. This modular design allows administrators to call an external authentication program instead of the default “auth” binary.

Chapter 7: Advanced Topics

181

Shown in the above diagram, this external authentication program could perform any desired authentication function including LDAP and other database calls. As long as the program is executable by the Urchin user and conforms to the input/output requirements, Urchin can be easily modified to use a different form of authentication. Specifying the Authentication Routine To configure which authentication routine the Session Controller calls, edit the “etc/session.conf” file located in the Urchin Installation. This file contains configurable parameters that control the behavior of the Session Controller including which routine to call for authentication. Edit the line: AUTHENTICATION: ../bin/auth Replace, the “../bin/auth” with the path to your authentication routine. Be sure that the authentication routine is executable by the same user that urchinwebd (Urchin’s Apache web server) is running as. Input/Output Requirements When the Session Controller calls the authentication routine, it will pass the username, password, and the remote IP address of the user as command line arguments, such that: argv[1] = username argv[2] = password argv[3] = remote_addr The external authentication routine could choose to ignore any and all of these parameters. But typical authentication routines will at least look at the first two. After performing any and all desired authentication, the routine should exit with a code equal to zero for success and a minus one for failure. Exit Code 0 = successful authentication −1 = authentication failed The above authentication interface allows administrators to easily customize their own routines for validating user logins. Bypassing Authentication Using the above techniques, the Urchin authentication can be purposefully bypassed. In the case where a hosting provider wants to use the entire Urchin System for controlling users and groups, but they have already authenticated the user by the time the user arrives at Urchin, bypassing the authentication is an option to avoid a double login. As long as the host can guarantee that access to the Urchin System is controlled from an authenticating portal and that the username cannot be tampered with, the host can bypass authentication using the following technique. To bypass the authentication create a dummy external authentication routine that always exits with a zero. For example, perl code might look like: #!/usr/bin/perl exit(0); Chapter 7: Advanced Topics

182

Point the Session Controller at this dummy authentication routine by editing the “etc/session.conf” file to point to this dummy routine as described above. Next, simply provide a link that looks like: http://hostname:9999/session.cgi?action=login Modify the above link to point to your actual hostname and port, and modify the user to the point to the desired username or variable. The dummy authentication routine will automatically approve this login. Please use this method with care to avoid security problems. Note for Windows Users In order to provide similiar functionality in Windows environments where Perl is not installed, a simple noauth.exe binary is available from the Helper Scripts area of the Urchin Support web site. This binary is merely a "no−op" − it simply returns a successful status when called. Be sure you understand the security implications of this before implementing this solution.

Linking Directly to Urchin Reports

Overview In a standard Urchin installation, delivery of Urchin reports is controlled via the embedded session controller and Apache webserver that ship with Urchin. Users view their Urchin reports by authenticating themselves via an Urchin−controlled login process, and are then presented with a list of Urchin reports that they are authorized to view. It is possible, however, to bypass Urchin's authentication and session controller and provide users with Urchin reports via a direct link from a portal or other external web site. Urchin 5 provides this capability as one of its standard integration points. • If all Urchin report data is in /data/reports, read Basic Scenario, below. • If Urchin report data is in a location other than /data/reports, read Advanced Scenario, below. • If you are supporting multiple users with multiple profiles, and each user has his/her reports in his/her own directory, read Large Configuration Scenario, below. Basic Scenario This section applies to a standard Urchin installation, where all Urchin reporting data is located in the data/reports directory of the Urchin distribution and the Apache webserver supplied with Urchin (urchinwebd) delivers the content for the Urchin reports. Step 1: Enable "Direct Report Linking" • Log in to the Urchin administrative interface as the "admin" user. • Navigate to the Settings −> Access Settings screen. Chapter 7: Advanced Topics

183

• Set the "Direct Report Linking" field to "on".

Alternatively, you can enable direct report linking by using the "uconf−driver" utility in the "util" directory of the Urchin distribution: ♦ Start a command shell on the Urchin system ♦ Change directory to the "util" directory of the Urchin distribution ♦ Set the direct report linking by typing: uconf−driver action=set_parameter recnum=1 cr_directlink=on Step 2: Configure links to Urchin reports For each profile (report) to which you want to provide a link, create a link in the following format: (baseurl)/report.cgi?profile=(profilename)

where (baseurl) is everything before session.cgi in the current URL, (profilename) is the URI−escaped name of the profile, and (username) is a user that has access to the report. The user= setting is optional and controls the language and localization preferences; if not specified, the admin user is assumed. For example, http://www.hollywoodweb.com/report.cgi?profile=www.hollywoodweb.com

Advanced Scenario Step 1: Enable "Direct Report Linking" Follow Step 1 in Basic Scenario, above. Step 2: Set urchin.cgi permissions Make sure that bin/urchin.cgi and util/uconf−driver in your Urchin installation are accessible and executable by your web server user. Step 3: Copy htdocs/report.cgi to the directory(ies) from which you wish to run it. Step 4: Link/alias uicons, ujs, usvg, and ucss folders The reporting interface needs access to certain javascript files and icons. From the directory that will contain report.cgi, create links or aliases to these folders. This can be done as symbolic links, web server aliases, or you can simply copy the folders into the location. To set up symbolic links: cd [report.cgi location] ln –s [urchin path]/htdocs/uicons uicons ln –s [urchin path]/htdocs/ujs ujs ln –s [urchin path]/htdocs/usvg usvg ln –s [urchin path]/htdocs/ucss ucss When using symbolic links, make sure that your web server is configured to allow the following of symbolic Chapter 7: Advanced Topics

184

links. For Apache, this is the FollowSymLinks directive. Step 5: Ensure that the web server is configured to execute CGI applications. For Apache, enable ExecCGI. Do not use script aliases. Step 6: Edit htdocs/.report.conf Uncomment the "Profile =" and "User =" lines if they are commented, and specify the profile and user. For example: Profile = www.hollywoodweb.com User = johng

Large Configuration Scenario This scenario allows you to easily support multiple users with multiple profiles. Instead of creating a .report.conf file for each user−profile combination, you create a single .report.conf file that allows the report engine to dynamically extract the user and profile name from the URI stem. The steps to establishing the Large Configuration Scenario are identical to the Advanced Scenario, above, except that instead of explicitly setting Profile and User in a .report.conf file, you specify a Regular Expression (as is commonly used in Tcl, Perl, C#, VB.NET, and other languages) in the URIMatch field. For example, if the user johng types in the following URL to get his reports: http://mach1.net/johng/www.hollywoodweb.com/report.cgi

then specifying the following fields in the .report.conf file would extract "johng" as the user and "www.hollywoodweb.com" as the profile: URIMatch = ^/(.*)/(.*)/report.cgi Profile = $2 User = $1

Explanation: The URI stem is "/johng/www.hollywoodweb.com/report.cgi". The URIMatch field is a Regular Expression which parses out the user and profile from the URI stem. • The first part of the URIMatch field is "^/" indicating that the URI stem must begin with a "/". • The last part of the URIMatch field is "report.cgi" indicating that the URI stem must contain "report.cgi" before or at the end. • The "(.*)/" means "extract the string up to the next "/" and save it (argument 1). Thus, the string "johng" is assigned to $1. • The next "(.*)/" means "extract the string up to the next "/" and save it (argument 2). • "Profile = $2" means assign the contents of argument 2 to Profile. • "User = $1" means assign the contents of argument 1 to User. Therefore, the steps for the Large Configuration Scenario are:

Chapter 7: Advanced Topics

185

• Step 1 − 5: Follow steps 1 through 5 in the Advanced Scenario. • Step 6: Create a .report.conf file that uses Regular Expressions to dynamically extract the user and profile name from the URL string that the user types in.

Script−based Configuration Management Overview

Overview Urchin 5 provides utilities and functionality to allow all administrative operations to be performed via unattended scripts. Only the report viewing requires the web−based interface to be operational. All configuration and log processing activities can be scripted using the following utilities and techniques. For first time users, it is helpful to run the web−based administrative interface first, in order to get familiar with the terminology and capabilities of the Urchin administrative system. Urchin 5 includes several utilities for modifying the Urchin configuration database without using the web−based administrative interface. Located in the util directory of the distribution, these utilities are: uconf−driver uconf−export uconf−import uconf−schedule

Each of these utilities must be run from a command shell or a script, as there is no ability to execute them from the web−based Urchin administrative interface. Complete documentation for each of these utilities is available in the Utilities section of the Advanced Topics area of the Urchin Documentation Center at http://help.urchin.com. Here is a summary of the functional use of these utilities: 1. The uconf−export utility exports the entire configuration into a file, or to standard output (stdout) if no file is specified. The format of the exported data is an XML−type format defined in the documentation for uconf−export. Each record in the exported data corresponds to a configuration record in the configuration database. 2. The uconf−import utility imports the same XML type formatted data used by uconf−export into the configuration database. This tool provides functionality for importing or editing of single records, or replacing the entire Urchin configuration with the contents of a text−based configuration file. 3. The uconf−driver utility performs specific actions to individual records. All parameters can be passed on one line as arguments to the script, or a file with multiple commands (one per line) can be used. 4. The uconf−schedule utility updates task scheduling directives on a global bases for all configured profiles; for example to set all Profiles to run at 1:00am daily. It also has additional capabilities to run Profiles immediately with or without permanently changing the scheduled time − see the documentation for uconf−schedule for additional details on these features. Note that you can use the uconf−export and uconf−import utilities to easily make a backup of or restore your Urchin configuration. This provides a very quick method of recovering an Urchin Chapter 7: Advanced Topics

186

installation after a disk failure or other system problem. An example of this functionality: ♦ Save configuration for safekeeping: uconf−export > /path/to/saved−configurations/urchin−cfg.save

♦ Restore Urchin configuration from a known good backup: uconf−import −r −f /path/to/saved−configurations/urchin−cfg.save

Intended Usage The uconf−export and uconf−import utilities are intended to provide a simple method for importing and exporting data from the Urchin configuration database using regular text files in an XML−type format. These utilities also allow you to specify the names of profiles, log sources, filters, users, etc. in configuration directives which specify access lists rather than the more cryptic record number lists that are used by the uconf−driver utility. The uconf−import utility can be used to add new records or modify existing records, but it cannot remove old records (unless the database is completely reset with the "−r" option). The uconf−driver utility is very powerful and can be used for very specific scripting operations that may change only a few parameters in a database record, as well as performing complete record additions, modifications and deletions from the database. It can also be used for querying the configuration database for several parameters. This utility is more ideally suited for use in an environment where scripting all administrative functions of Urchin is desired, such as in automated provisioning systems or very large hosting environments where use of the Urchin administrative GUI is impractical. Note that the uconf−driver is a lower−level utility that does not automatically maintain associations between the various database tables when working with directives that maintain cross−reference lists. When using uconf−driver to script configuration operations, please be aware that many of the tables contain directives that refer to other records, or lists of records. These directives are: ct_ulist, ct_glist, ct_llist, ct_flist, and ct_rlist, which refer to the user, group, logfile, filter, and profile tables respectively. These lists are represented as exclamation point delimited list of recnums, as demonstrated by this list of filter records: ct_flist="!13!36!56!"

where each entry represents the recnum value of a record and is surrounded with exclamation points. For the uconf−import and uconf−export utilities, this directive would be specified as: ct_flist="filter1,filter2,filter3"

Important! Regardless of which utility you use to manipulate the configuration, you must be careful to keep cross−references intact. For example, a Filter record has a ct_rlist which details all of the profiles that the filter applies to; and a Profile record has a ct_flist which details all of the filters that apply to this profile. Note that the uconf−import and uconf−export utilities translate and verify the lists specified in the directive for you; uconf−driver does not. Special Usage Notes

Chapter 7: Advanced Topics

187

♦ The uconf−driver uses exclamation point delimited lists of record numbers for directives that maintain associations with other tables (e.g. ct_flist), whereas the uconf−export and uconf−import utilities use comma−delimited lists of names in these directives. Be sure to use the appropriate list specification syntax for the utility you are using. ♦ If a Profile is added but no corresponding Task is added, scheduling of the Profile cannot be managed within the Urchin admin GUI interface. In addition, the Profile cannot be scheduled to run with the Urchin Task Scheduler. ♦ When adding or editing the "ct_password" directive for use with either a User or Remote Log Source password, uconf_driver and uconf−import will automatically encrypt the password before writing it to the Urchin configuration database to ensure that passwords are not stored in clear text. For portability reasons, the encryption is in a proprietary format that is not compatible with other password encryption formats such as "crypt" on UNIX−type systems. Examples of pseudo−code scripts to perform tasks This section gives examples of using uconf−driver and uconf−import in scripting pseudo−code, which could be easily translated into a UNIX−type shell script, a Perl script or a Visual Basic script depending on the needs of the application. Additional examples are given in the documentation for each specific utility. Apply a German language setting to all users $user_count = `uconf−driver action=nrecords table=user`; for ($i = 1 ; $i Scheduler screen of the Urchin administration interface. The Chapter 7: Advanced Topics

197

utility can also be used to import custom entries into the DNS databases. See the section on the geo−update program under Advanced Topics−>Utilities for complete instructions on the options to use for creating custom DNS entries.

Custom Lookup Tables

Beginning with version 5.6, Urchin allows you to define custom lookup tables. One useful application of a lookup table is to substitute human readable text for the often cryptic request parameters used with dynamic URLs. For example, consider a web site in which the Pages &Files−−>Page Query Terms report is used to rank the popularity of requested documents. In the report (shown below), the document id is displayed instead of the full document name. The numeric id is shown because the report simply ranks the popularity of requests of the form http://www.hostsite.com/index.cgi?id=1001

Applying a lookup table which maps document names to numeric ids allows us to view the same information in Pages &Files−−>Requested Pages, with the full document name displayed.

This article illustrates how to create and apply a lookup table for this example. The details of your lookup table and filters may differ according to your particular application, however, the basic steps will still apply. Defining Your Lookup Table To define your table:

Chapter 7: Advanced Topics

198

1. Create a table in Excel that maps your codes to text labels. An example is shown below. The first row of the file must begin with "#Fields:", followed by "request_stem" in column 2.

2. Save the Excel table as a tab delimited plain text file in the lib/custom/lookuptables directory of your Urchin distribution. You must save the file with an extension of ".lt". Applying the Table 1. Apply the following Advanced filter to your profile. This filter tells Urchin to look in the request_uri for the string "id=", extract the id, and write the id into the request_stem. (Note that request_stem is the title of the second column of the lookup table.)

2. Apply the following Lookup Table filter (applied on the request_stem field) to your profile. Select your lookup table in the Table Name drop down list. If your lookup table does not appear as an option in the drop down list, make sure that your lookup table file name ends with .lt and that it has been saved in the lib/custom/lookuptables directory of your Urchin

Chapter 7: Advanced Topics

199

distribution.

Cobranding Urchin

Overview Urchin accomodates cobranding in the administration interface, the reporting interface, and, beginning with Urchin 5.6, the login screen. There are two files to edit in order to include HTML at the top of the interface (three files with version 5.6). If a complete portal integration is being done, then the Urchin reporting can be delivered within a frameset or table by your application server. (Please see the article on Portal Integration in the Integration Section.) Otherwise, follow the instructions below to cobrand your interface. Please note that your license agreement may prohibit obscuring or changing the Urchin Logo and Reports, beyond what is provided in this article. Cobranding Instructions To cobrand your interface, you will need to edit the following files located in the Urchin installation: [urchin install dir]/lib/custom/cobrands/cobrand_admin.tpl [urchin install dir]/lib/custom/cobrands/cobrand_report.tpl [urchin install dir]/lib/custom/cobrands/cobrand_session.tpl (version 5.6 only)

The first file controls the cobranding on the admin interface and the second controls the cobranding on the reporting interface. The third file, available beginning with version 5.6, controls the cobranding on the login screen. Add HTML content to these files as necessary to include your branding. The HTML provided in these files will be placed on top of the Urchin interface as shown in the example below.

Chapter 7: Advanced Topics

200

Hosting Automation Solutions How are H−Sphere and Urchin 5 Integrated?

An unlicensed copy of Urchin 5 is now integrated into Positive Software Corporation's H−Sphere. Psoft customers who wish to integrate Urchin 5 can now download H−Sphere to enable Urchin 5: http://www.psoft.net/HSdocumentation/new_features.html#231 Note: Urchin 5 comes unlicensed, so customers may activate the 15 day demo license, then purchase an Urchin license via the standard methods: either in the Urchin 5 admin interface, on the urchin.com website, or by contacting [email protected]. Detailed download and installation information is available here: http://www.psoft.net/HSdocumentation/sysadmin/urchin4.html

Using Urchin with Plesk PSA 5.0

Note: the following information has been provided by Plesk technical personnel. Instructions Chapter 7: Advanced Topics

201

Log into your PSA interface as Admin and select the Extras button. This takes you to MyPlesk.com and allows MyPlesk.com to know that you are the Admin of a PSA license. Under the Server Tools tab you will see the Urchin offering. Below are some additional instructions that you will need when installing Urchin: For PSA 5.0 only ♦ Use Urchin install instructions found at MyPlesk.com ♦ PSA log rotation feature should be turned off ♦ Configure and use Urchin log archiving/deleting Everything in Urchin should be set as described in documentation with only one difference − Log File path should point to /path/to/vhosts/domain.com/logs/access_log.processed and if ssl is enabled, /path/to/vhosts/domain.com/logs/access_ssl_log.processed. .processed files are created by PSA statistics utility (after calculations by internal PSA stats) and processing should be scheduled to run daily at 5:00am Here are some examples: ♦ If you are running the standard version of PSA, your path will be /usr/local/psa/home/vhosts/DOMAIN.NAME/logs/access_log.processed ♦ If you are running the RPM version of PSA, your path will be /home/httpd/vhosts/DOMAIN.NAME/logs/access_log.processed ♦ If you began using PSA version 1.3.x and have upgraded to PSA version 5.x, your path will be /usr/local/plesk/apache/vhosts/DOMAIN.NAME/logs/access_log.processed Affiliate Program Info Please note that if your server is registered with MyPlesk.com (and you join the Affiliate Program), your MyPlesk.com account will be credited with an amount equal to 10% of the amount you pay for the Urchin Solution.

Ensim Webppliance

How are Ensim Webppliance and Urchin 5 Integrated? An unlicensed copy of Urchin 5 is currently integrated into Webppliance 3.6 for Windows. Ensim has not announced plans to integrate Urchin into their Webppliance for Linux/Unix products. They are interested in determining demand, though. If you would like Ensim to support this integration, please contact [email protected]. You may attempt to run Urchin 5 outside of your Ensim environment, but it is unsupported.

Chapter 7: Advanced Topics

202

Sphera's HostingDirector

How are Sphera's HostingDirector and Urchin 5 Integrated? An unlicensed copy of Urchin 5 is being integrated into HostingDirector. Sphera customers who wish to upgrade to Urchin 5 will be able to do so when HostingDirector 3.8 is available toward the end of 2003. With that release Urchin 5 will become a shared service on the server, so customers will no longer have to license a separate copy of Urchin for each VPS.

Performance &Tuning Global Filtering of Hits from Monitoring Software

Overview Most Hosting environments provide some sort of monitoring of customer webservers in order to maintain Service Level Agreements (SLAs). As a side effect, however, the hits from this monitoring can really skew the Urchin reporting for the monitored web sites − artificially inflating session, pageview, hit and byte counts. Recommendation In Hosting environments that employ such monitoring, it is highly recommended that a standard/global Urchin filter is applied to each customer's configured Urchin profiles to strip out the hits generated by montoring software. This is easily done in an environment where a centralized Urchin installation (managed by the hosting company) provides reporting for each customer's website(s). In dedicated/colocation environments where the customers themselves maintain an instance of Urchin on their server(s), the Hosting company should provide a sample filter that is appropriate for the monitoring being used. To aid in the implementation of Urchin filtering, the Host and the customer should work together to create a specific page on the customer website that only the monitoring software utilizes, e.g. something like: http://www.customerdomain.com/healthpage.html

Examples Example 1: Filter out the IP address for the monitoring system Chapter 7: Advanced Topics

203

Filter Type: Exclude Filter Field: IP Filter Spec: 172\.16\.1\.1

This will strip any hits with the IP address 172.16.1.1 out of the webserver log as Urchin is processing it. Example 2: Filter out specific page that the monitoring system hits Filter Type: Exclude Filter Field: REQUEST Filter Spec: ^/healthpage.html

This will strip any hits with a request for /healthpage.html out of the webserver log as Urchin is processing it. Considerations 1. It may be desirable to create additional, non−filtered Profiles for the customers so they can see the actual traffic load (including the filtering) on the webserver(s). 2. The Hosting company may want to provide a Profile that provides reporting exclusively for the monitoring hits − e.g. it filters in only hits from the monitoring software. This profile could be used to show that the proper monitoring is being done and that SLAs are being met.

Reducing Disk Storage for Urchin Profile Monthly Databases

Overview Urchin reporting data is stored in independent monthly databases for each Profile configured within Urchin. These databases typically reside in the data/reports directory of the Urchin distribution. By default, Urchin will keep an unlimited number of these monthly Profile databases. For most small and medium sized sites, the storage requirements are modest. Because Urchin reporting does not require access to the raw webserver logs once they've been processed, there is no need to keep the webserver logs. The processed Urchin monthly databases will be approximately 5−10% of the size of the raw webserver logs that were processed to populate the Urchin databases, and in most cases this will represent a very minimal amount of disk space even if all Urchin databases are kept indefinitely. For large sites, however, which produce hundreds or thousands of megabytes worth of webserver logs per day, or hosting providers who have a very large number of Profiles configured, it may be desirable to reduce Urchin's ongoing data storage requirement. This can be accomplished in one of the following ways: 1. Set the profile to automatically delete the raw tracking data after processing the logs 2. Set the profile to archive historic data Chapter 7: Advanced Topics

204

3. Limit the number of months of historical reporting data that are retained Instructions for each of these methods is provided at the end of this article. Technical Overview of Urchin Database Storage For each Urchin profile, Urchin maintains a set of nine monthly databases that provide data for the reporting engine. The databases are named after the month for which they store data. The complete list of databases is: YYYYMM−hdata.und YYYYMM−hdata.uni YYYYMM−hdata.uns YYYYMM−ldata.und YYYYMM−ldata.uni YYYYMM−pdata.und YYYYMM−sdata.und YYYYMM−tdata.und YYYYMM−udata.unf YYYYMM−vdata.und YYYYMM−vdata.uni

−−> −−> −−> −−> −−> −−> −−> −−> −−> −−> −−>

hash table data hash table index hash table string data log tracking data log tracking indexes path data session data totals data header for the database visitor data visitor index

Each set of databases is complete for the month of data that it contains. Since there is no interdependency between the monthly database sets, archiving and pruning operations can be performed independently on each database set without affecting any other month. Under normal operation, the entire set of nine monthly database file is retained for each month. However, four of these database files are used only by the Urchin log processing engine. These database files are: YYYYMM−pdata.und YYYYMM−sdata.und YYYYMM−vdata.und YYYYMM−vdata.uni

These databases contain information about paths, sessions and visitors and can account for a substantial percentage of the total storage space required for the month, on the order of 10−50%. Thus there can be a significant disk space advantage by setting the Keep Raw Tracking Data option to off in the Storage/DB screen of the Profile configuration. Important Note: If you plan to upgrade to a future major release of Urchin, this raw tracking data will be used for linking records together. Absence of this data will affect certain new visitor−centric drill down reports that are planned for Urchin. Therefore, it is recommended that only extremely high traffic sites for which keeping the raw tracking data represents a disk or CPU resource consumption issue disable the keeping of raw tracking data. Other potential disk space savings can be obtained by compressing historic Urchin monthly databases into ZIP archives. The resulting archives are typically only 20−30% the size of the uncompressed database set. While the Urchin reporting engine cannot read the ZIP archives directly, it has the ability to extract the databases it needs from the ZIP archives on the fly. This is completely transparent to a person viewing Urchin reports, other than a slight delay while the databases are being unpacked. The reporting engine does not remove the databases Chapter 7: Advanced Topics

205

it has unpacked; this allows quicker access to data while the person is viewing the Urchin reports. However, the original ZIP archive is left in place, so a periodic cleanup operation can simply remove the unpacked databases to regain the disk space once again. The last avenue for reducing Urchin storage requirements is to establish a policy for the duration of historical reporting that Urchin is to provide. For instance, in environments where Urchin is provided as a reporting service with a hosting package, it is very common to provide Urchin historical for the period of one year. Due to the monthly organization of Urchin databases, it is very easy for automatic scripting mechanisms to automatically remove old monthly databases that have aged past a certain threshold. When a historical reporting length policy is implemented, Urchin's data storage requirement will typically stabilize or only increase slightly once the historical retention limit has been reached. Methods for Reducing Data Storage − How To Method 1: Delete the Raw Tracking Data after Log Processing You can configure the profile to delete raw visitor and session information after processing. For large sites, this improves performance and reduces the amount of data stored. Note: Sessions that overlap days appear as two sessions (one for each day) instead of one session, when this configuration is selected. The difference in results will be negligible for most sites. To configure the profile to delete raw visitor and session information after processing: 1. In the Admin interface, click Configuration, then Urchin Profiles−−>Profiles. 2. Edit the desired profile. 3. In the Storage/DB tab, turn the Keep Raw Tracking Data field "off". 4. Click Update. Method 2: Auto−Archive Historic Data You can configure the profile to compress historic monthly data into an archive. The reports can view the archived data, but no additional hits may be processed for the archived months. To configure the profile to archive historic data, 1. In the Admin interface, click Configuration, then Urchin Profiles−−>Profiles. 2. Edit the desired profile. 3. In the Storage/DB tab, turn the Archive DB field "on". 4. Specify a number of months for the Archive DB After field. 5. Click Update. Method 3: Limit Retention of Databases for Historical Reporting For each Urchin Profile, simply remove any databases in the data/reports/profile−name directory that begin with a YYYYMM prefix that have aged past the threshold needed for historical reporting. For example, if you wish to retain a one−year reporting history and the current month is February 2004, you would remove any databases named 200301−*data.un* to delete the reporting data from January 2003 for that Urchin profile. This would be repeated for all databases older than January 2003. Chapter 7: Advanced Topics

206

For an example of a ready−to−run Perl script that will automatically prune the Urchin databases after a certain period of time, please see the PruneUrchinData script at http://www.urchin.com/support/scripts/purge_udata.pl

Security Features Activating SSL on the Urchin Webserver

The Urchin webserver that ships with Urchin 4.100 and later is capable of encrypting communication via SSL. To enable SSL, you will need to have either a valid certificate signed by a certificate authority or a self−signed certificate. The process for enabling SSL in the Urchin webserver are as follows: 1. Copy your SSL certificate file into the Urchin var directory and name it server.crt 2. Copy your SSL key file into the Urchin var directory and name it server.key 3. Edit the urchinwebd.conf.template file located in the Urchin var directory. Change the ServerName directive from localhost to the name of your webserver. For instance: ServerName: www.urchin.com NOTE: The ServerName in the urchinwebd.conf.template file needs to match the name of the server that is in the certificate file. 4. Start or restart the webserver using urchinctl with the "−e" option. Urchinctl is located in the Urchin bin directory. The "−e" option instructs urchinctl to enable SSL in the webserver. For example, to restart the webserver with SSL enabled, use: urchinctl −e restart To start the server without SSL enabled, just remove the "−e" option from the urchinctl command. You should now be able to access your SSL enabled server using https://servername.domain.com:port/ NOTE: Customizing the SSL settings in the urchinwebd.conf.template may result in problems that could prohibit the webserver from starting.

Chapter 7: Advanced Topics

207

Chapter 8: Reference

Integer Field List

Overview When a hit is processed by Urchin, certain integer fields are available including whether the hit is a pageview, a new session, how many bytes were transferred, etc. These integer values are used in updating many of the tables. In particular the Data Map which maps all of the text−type data tables references these integer fields by number. Integer Field List The following table lists all of the available integer fields and their corresponding id number. IFIELD id 1 2 3 4 5 6

Field Name Session Pageview Non−Pageviews Hits Valid Hits Error Hits

Chapter 8: Reference

208

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

UTM Hits Non UTM Hits Robot Hits Non Robot Hits Bytes Robot Bytes Non Robot Bytes Forms Responses Transactions Items Transaction Revenue Item Revenue Downloads Repeat Responses Cost Primary Goals Clicks Impressions

Regular Field List

Overview When a hit or entry in a log file is read during processing, the hit is broken down into 'Raw Fields'. Fields are generally separated by spaces, tabs, or commas. The Log Format determines how these Raw Fields are assigned internally. Once the Raw Fields are read, Urchin calculates a number of 'Auto Fields' based on the 'Raw Fields'. Most reports use these Auto Fields for updating. Filters can be applied to either Raw or Auto Fields. The following table lists all available Fields and their purpose. Regular Field List id 1 2

Field iis_date iis_time

Chapter 8: Reference

Type (RAW) (RAW)

Purpose IIS raw date of hit field. IIS raw time of hit field.

209

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 76 77

apache_time c_ip cs_username cs_request cs_method cs_uristem cs_uriquery sc_status sc_bytes c_host cs_useragent cs_cookie cs_referer custom_date custom_time cs_host s_port cs_version s_sitename s_computername s_ip elf_orderid elf_store elf_sessionid elf_total elf_tax elf_shipping elf_billcity elf_billstate elf_billzip elf_billcountry elf_productcode elf_productname elf_variation elf_price elf_quantity elf_upsold referral_protocol referral_host

Chapter 8: Reference

(RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (AUTO) (AUTO)

Apache raw date &time of hit field. Client IP Address. Client username (if any) Apache raw entire request field. IIS raw request method field. IIS raw request stem field. IIS raw request query field. Return status code from server. Number of bytes transferred for request. Client hostname (converts to c_ip if necessary). Browser user−agent information. Cookies sent by browser. Raw Referral information (could be internal). Used for datestamp in Custom Logs. Used for timestamp in Custom Logs. Requested virtualhost by Client. Server port number. IIS Raw HTTP version. IIS Server site name. IIS Computer name. IIS Server IP address. E−commerce order id number. E−commerce store name. E−commerce session id. E−commerce transaction amount. E−commerce tax amount. E−commerce shipping amount. E−commerce customer city. E−commerce customer state. E−commerce customer zip code. E−commerce customer country. E−commerce product code. E−commerce product name. E−commerce product variation. E−commerce product price. E−commerce product quantity. E−commerce upsold variable. Referral protocol (http/https/etc.) Referral complete hostname.

210

78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116

referral_domain referral_port referral_url referral_uri referral_stem referral_query referral_anchor referral_directory referral_filename referral_mime referral_keywords referral_domainandstem referral_errordetail request_method request_url request_version request_protocol request_host request_port request_uri request_stem request_query request_anchor request_directory request_filename request_mime request_origfilepath request_origmime request_errordetail useragent_complete browser_base browser_version platform_base platform_version domain_primary domain_complete sid utm_cookiea utm_cookieb

Chapter 8: Reference

(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)

Referral domain name. Referral port number (if any). Referral complete URL. (includes host) Referral complete URI. (no host) Referral URI stem without query info. Referral Query info by itself. Referral information after # tag. Referral directory up to filename. Referral filename without directory. Referral mime type (file extension) Referral search engine keywords Referral domain and URI stem together. Referral error detail information. Request method (GET/POST/etc.). Request complete URL (if provided). Request protocol version. Request protocol (HTTP/etc.). Request hostname (if any). Request port number (if any). Request URI with query. Request URI without query. Request query information (e.g., after ?) Request information after # tag Request directory without filename. Request filename without directory. Request mime type (file extension). Request original uri stem if UTM. Request original mime type if UTM. Request detail for error hits. Complete user− agent. Browser name (e.g., Netscape). Browser version. Platform (e.g., Windows). Platform version. First level domain. (e.g. com). Complete domain. (e.g. urchin.com). Session id (if any). UTM−2 cookie−a UTM−2 cookie−b

211

117 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147

utm_cookiec utm_cookie1 utm_cookie2 utm_cookie3 utm_unique_id utm_new_campaign utm_page utm_referral utm_screen_resolution utm_screen_available utm_browser_size utm_screen_colors utm_language utm_java_enabled utm_cookies_enabled utm_timezone_offset utm_js_version utm_session_number utm_repeat_campaign utm_campaign utm_medium utm_source utm_term utm_content utm_campaign_session utm_campaign_number utm_campaign_time elf_region utm_campaign_srcmedium utm_campaign_srcmedtrm

(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)

148

utm_campaign_sesdelta

(AUTO)

149

utm_campaign_daysdelta

(AUTO)

150 151 152 153 154 155

utm_campaign_hour utm_campaign_goal log_source_name utm_ipandvisitorid utm_id utm_type

(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)

Chapter 8: Reference

UTM−2 cookie−c UTM−1 cookie−1 UTM−2 cookie−2 UTM−3 cookie−3 UTM unique visitor id. new campaign variables detected UTM page variable (used for request_ variables). UTM Referral (used for referral_ variables). Screen resolution (e.g., 800x600). Available screen resolution in pixels. Browser size in pixels. Screen color bit depth. Browser language code setting. yes|no if java is enabled. yes|no if cookies are enabled. +/−HHMM timezone offset value of browser. Javascript version info. Number of sessions for this visitor. Repeat campaign detected. same as utm_campaign in a link. same as utm_medium in a link. same as utm_source in a link. same as utm_term in a link. same as utm_content in a link. session number of this campaign. Number of responses in __utmz. Time in seconds of the current campaign. E−Commerce region drilldown information. utm_campaign [utm_medium]. utm_campaign [utm_medium] | utm_term. difference in current session number and campaign session number. difference in days between the hit viewtime and the campaign time. hour of the day the campaign occurred. campaign goal that was met. Log source name in Log Source Wizard. IP address or host − visitor key. same as utm_id in a link. used for email impressions. 212

Regular Report List

Overview During processing, each hit in the log file is separated and calculated into different fields. These fields are then used to update data tables which are queried for report views. Some reports have special storage and are not included in the data tables. The following table lists all of the predefined reports and which data table is queried for each. Regular Field List View# 1100 1102 1103 1104 1105 1110 1900 1903 1907 1901 1905 1904 1906 1902 1200 1201 1211 1206 1202 1207 1208 1203 1209

Data Table

Report Name Traffic Sessions Graph Pageviews Graph Hits Graph Bytes Graph Summary Visitors &Sessions Visitors by Day Sessions by Day Unique Visitors Unique Sessions Visitor Loyalty Session Frequency Summary Pages &Files Requested Pages Downloads All Files Directory by Pages Drilldown Directory by Files Drilldown Directory by Bytes Drilldown File Types by Hits File Types by Bytes

Chapter 8: Reference

Fields Used

− − − − −

− − − − −

− − − − 37 − −

− − − − utm_session_number vs. sessions − −

7 17 12 7 12 18 13 19

request_stem vs. pageviews request_stem vs. downloads request_origfilepath vs. hits request_stem vs. pageviews request_origfilepath vs. hits request_stem vs. bytes request_origmime vs. hits request_origmime vs. bytes 213

1210 1205 1204 1600 1601 1602 1609 1603 1610 1604 1608 1606 1300 1301 1303 1302 1304 1305 1400 1401 1402 1403 1404 1405 1406 1407 1409 1500 1501 1504 1505 1502 1506 1507 1503 1510 1511 1800 1801

Page Query Terms Posted Forms Status and Errors Navigation Entrance Pages Exit Pages Click Paths Click To and From Length of Pageview Depth of Session Length of Session Click To and From Report Referrals Referrals Referral Drilldown Search Terms Search Engines Referral Errors Domains &Users Domains Domain Drilldown Countries IP Addresses IP Drilldown Usernames by Hits Usernames by Bytes Usernames by Sessions Browsers &Robots Browsers by Sessions Drilldown Browsers by Hits Drilldown Browsers by Bytes Drilldown Platforms by Sessions Drilldown Platforms by Hits Drilldown Platforms by Bytes Drilldown Combos by Sessions Robots by Hits Drilldown Robots by Bytes Drilldown Client Parameters Screen Resolution

Chapter 8: Reference

11 8 14

request_stem|request_query vs. hits request_stem vs. hits sc_status|request_errordetail vs. hits

20 20 21 7 22 − − 20

request_stem vs. pageviews request_stem vs. pageviews request_stem vs. sessions request_stem vs. pageviews request_stem vs. time − − request_stem vs. pages

1 1 2 2 23

referral_domainandstem vs. sessions referral_domainandstem vs. sessions referral_domain|referral_keywords vs. sessions referral_domain|referral_keywords vs. sessions referral_errordetail|referral_domainandstem vs. hits

4 4 4 6 6 10 16 5

domain_primary|domain_complete domain_primary|domain_complete vs. sessions domain_primary|domain_complete vs. sessions c_ip vs. sessions c_ip vs. sessions cs_username vs. hits cs_username vs. bytes cs_username vs. sessions

3 9 15 3 9 15 3 24 25

useragent_complete vs. sessions useragent_complete vs. hits useragent_complete vs. bytes useragent_complete vs. sessions useragent_complete vs. hits useragent_complete vs. bytes useragent_complete vs. sessions browser_base vs. hits browser_base vs. bytes

31

utm_screen_resolution vs. sessions

214

1804 1805 1806 1808 1809 2100 2101 2102 2103 2104 2105 2106 2107 2100 2101 2102 2103 2104 2105 2106 2200 2201 2202 2203 2204 2206 2206 2211 2212 2213 2214 2215 2216 2221 2222 2223

Screen Colors Languages Java Enabled Timezone Offset Javascript Version E−Commerce Revenue Number of Transactions Products by Revenue Products by Quantity Products by Revenue Drilldown Products by Quantity Drilldown E−Commerce Summary Revenue Source Revenue by Region Drilldown Revenue by City Revenue by Referrals Revenue by Search Terms Revenue by Search Engines Drilldown Revenue by Domains Drilldown Campaign Tracking Lead Source−Acquisition Lead Source−Quality Lead Source−Conversion Lead Source−ROI Lead Source−Conversion Lead Source−cost breakdown Keyword Analysis−Acquisition Keyword Analysis−Quality Keyword Analysis−Conversion Keyword Analysis−ROI Keyword Analysis−Conversion Keyword Analysis−cost breakdown Keyword Comparison−Acquisition Keyword Comparison−Quality Keyword Comparison−Conversion

Chapter 8: Reference

32 33 34 35 36

utm_screen_colors vs. sessions utm_language vs. sessions utm_java_enabled vs. sessions utm_timezone_offset vs. sessions utm_js_version vs. sessions

− − 42 41 42 41 −

− − elf_productname|elf_productcode vs. revenue elf_productname|elf_productcode vs. items elf_productname|elf_productcode vs. revenue elf_productname|elf_productcode vs. items −

43 43 44 45

elf_region vs. revenue elf_region vs. revenue referral_domainandstem vs. revenue referral_domain|referral_keywords vs. revenue

45

referral_domain|referral_keywords vs. revenue

46

domain_primary|domain_complete vs. revenue

56, 57, 53 56, 52, 51 56, 55, 81 56, 82, 54 56, 55 54 70, 71, 67 70, 66, 65 70, 69, 85 70, 86, 68 70, 69

source(medium) vs clicks, impressions, new leads source(medium) vs clicks, pages, sessions source(medium) vs clicks, goals, transactions source(medium) vs clicks, revenue, cost source(medium) vs clicks, goals source(medium) vs cost source (medium) vs clicks, impressions, new leads source(medium) | term vs clicks, pages, sessions source (medium) | term vs clicks, goals, transactions source(medium) | term vs clicks, revenue, cost source(medium) | term vs clicks, goals

68

source(medium) | term vs clicks

70, 71, 67 source (medium) vs clicks, impressions, new leads 70, 66, 65 source (medium) | term vs clicks, pages, sessions 70, 69, 85 source (medium) | term vs clicks, goals, transactions

215

2224 Keyword Comparison−ROI Keyword 2225 Comparison−Conversion Keyword Comparison−cost 2226 breakdown Campaign 2231 Comparison−Acquisition 2232 Campaign Comparison−Quality 2233

Campaign Comparison−Conversion

2234 Campaign Comparison−ROI Campaign Comparison−Conversion Campaign Comparison−cost 2236 breakdown 2235

70, 86, 68 source(medium) | term vs clicks, revenue, cost 70, 69

source (medium) | term vs clicks, goals

68

source (medium) | term vs clicks

campaign name | source(medium) vs clicks, impressions, leads campaign name | source(medium) vs clicks, pages, 56, 52, 51 sessions campaign name | source(medium) vs clicks, goals, 56, 55, 81 transactions campaign name | source(medium) vs clicks, revenue, 56, 82, 54 cost 56, 57, 53

56, 55

campaign name | source(medium) vs clicks, goals

54

campaign name | source(medium) vs cost

campaign name | source(medium) vs clicks, impressions, leads campaign name | source(medium) vs clicks, pages, Medium Comparison−Quality 56, 52, 51 sessions campaign name | source(medium) vs clicks, goals, Medium Comparison−Conversion 56, 55, 81 transactions campaign name | source(medium) vs clicks, revenue, Medium Comparison−ROI 56, 82, 54 cost Medium Comparison−Conversion 56, 55 campaign name | source(medium) vs clicks, goals Medium Comparison−cost 54 campaign name | source(medium) vs cost breakdown campaign name | source(medium) vs clicks, Content Testing−Acquisition 63, 64, 60 impressions, leads campaign name | source(medium) vs clicks, pages, Content Testing−Quality 63, 59, 58 sessions campaign name | source(medium) vs clicks, goals, Content Testing−Conversion 63, 62, 83 transactions campaign name | source(medium) vs clicks, revenue, Content Testing−ROI 63, 84, 61 cost Content Testing−Conversion 63, 62 campaign name | source(medium) vs clicks, goals Content Testing−cost breakdown 61 campaign name | source(medium) vs cost Goal Conversion by Hour 73, 72 term | hour vs goals, clicks Sales Conversion by Hour 87, 72 term | hour vs transactions, clicks IP−VisitorID | source (medium) vs |term repeat Repeat Clicks by IP 76 responses Repeat Clicks by IP 76

2241 Medium Comparison−Acquisition 56, 57, 53 2242 2243 2244 2245 2246 2251 2252 2253 2254 2255 2256 2265 2266 2267 2268

Chapter 8: Reference

216

2261 2262 2263 2264

Time To Goal Sessions To Goal Time To Transaction Sessions To Transaction

75 74 89 88

IP−VisitorID | source (medium) vs |term repeat responses days delta vs goals session delta vs goals days delta vs transactions session delta vs transactions

Configuration Table and Directive List

Overview The following matrices provide exact details on the table names, directives, and meanings for each database table in the Urchin 5 configuration. The first matrix defines each of the table names and what that table is used for. Then, for each database table, a comprehensive list of directives is provided. Please note that most records will not specify a value for every possible directive for the table to which the record belongs. In some cases the directives may not be applicable to that particular record. Also, Urchin will use default values if there is no explicit definition for a directive. Directives may be manipulated by the web−based Urchin administration interface, or by scripts that use the uconf−driver or uconf−import utilities. It should also be noted that this reference guide does not contain verbose descriptions of the directives and how they are to be used. In many cases, the intended usage of the directive may not be immediately obvious from the directive name and description provided. You should consult the appropriate sections in the Urchin Documentation Center at http://help.urchin.com to gain more insight about the capabilities of the product (e.g. filtering, backups, archiving, report view customization, etc.) and how the capabilities can be controlled with the configuration directives detailed below. Note: Where applicable, default values are printed in bold typeface. Table Name Definitions Table Name global machine filter logfile profile task

Meaning/Purpose General settings including licensing and remote access Process settings including database sizing, memory usage and process priority Specifies log and profile runtime filter parameters Specifies the location and format of a log source Log Processing and Reporting settings for a particular website Runtime schedule settings for a particular profile

Chapter 8: Reference

217

affiliation group user

Enterprise−level management of profiles, log sources, filters, groups and users Group−level management of users including profile access Individual user settings including password, language and locality

Directive List: global table Directive cr_dcmode cr_directlink cr_remoteaccess cr_remoteadmin cs_region

Meaning/Purpose datacenter mode (on|off) allow direct web links to Urchin reports (on|off) allow remote access (on|off) allow remote administration (on|off) two−letter global region code fr=France ge=Germany it=Italy ja=Japan ko=Korea po=Portugal sp=Spain sw=Sweden uk=United Kingdom

ct_license ct_name ct_port ct_serial ct_schedulers cr_setupwizard ct_var

license code used by Urchin Licensing identifier for record in global table port that Apache runs on (default: 9999) serial code used by Urchin Licensing [internal use only] run setup wizard first time (on|off) VAR code used by Urchin licensing length of time in seconds that the scheduler waits before checking for the next ct_schedulersleep task (default: 3) Directive List: machine table Directive cr_priority cs_preset cs_limitdbtable ct_dbuffsize ct_pbuffsize ct_name

Chapter 8: Reference

Meaning/Purpose run priority of Urchin log processing engine (low|normal|high) [internal use only] maximum number of records allowed in database tables (default: 10000) data buffer size in MB (default: 13) path buffer size in MB (default: 1) identifier for record in machine table

218

ct_sbuffsize ct_tbuffsize ct_vbuffsize

session buffer size in MB (default: 3) text buffer size in MB (default: 1) visitor buffer size in MB (default:2)

Directive List: filter table Unless otherwise noted, directives in this table apply to all filter types. cr_action

[internal use only] filter is case sensitive (yes|no) cr_casesensitive applies to advanced|exclude|include|replace filter types cr_filtertype type of filter: advanced=Advanced filter built from two other fields decode=Decode URL−encoded characters back to their original form dynamicurl=DynamicURL filter from Urchin 3 and Urchin 4 (deprecated) exclude=Exclude pattern filter include=Include pattern filter jaconv=Convert various Japanese encodings into UTF−8 encoding replace=Pattern search and replace filter overwrite data in the output field if it is already populated (yes|no) cr_override applies to advanced filter type ID number of field to apply filter to, from the Regular Field List reference table cs_filterfield applies to decode|exclude|include|jaconv|replace filter types ID number of first field to apply filter to, from the Regular Field List reference cs_infielda table applies to advanced filter type ID number of second field to apply filter to, from the Regular Field List reference cs_infieldb table applies to advanced filter type ID number of the field to ouput filter results to, from the Regular Field List cs_outfield reference table applies to advanced filter type exclamation−point delimited list of log source recnums to which this filter is cs_llist applied (uconf−driver) comma delimited list of log source names to which this filter is applied (uconf−import) exclamation−point delimited list of profile recnums to which this filter is applied cs_rlist (uconf−driver) comma delimited list of profile names to which this filter is applied (uconf−driver) ct_affiliation optional affiliation filter pattern (simple pattern or POSIX regular expression) ct_filter applies to include|exclude filter types Chapter 8: Reference

219

ct_inexpa ct_inexpb ct_name ct_outexp ct_replace ct_search

regular expression pattern for first filter applies to advanced filter type regular expression pattern for second filter applies to advanced filter type identifier for record in filter table expression defined explicitly or constructed from saved pattern parts of input expressions, e.g.($A1, $B2) applies to advanced filter type replacement string pattern applies to replace filter type search string pattern applies to replace filter type

Directive List: logfile table Directive cr_action

Meaning/Purpose [internal use only] disposition of log after processing (1=don't touch, 2=archive/compress, cr_logdestiny 3=delete) cr_protocol remote log transfer protocol (ftp|http) cr_type location of log (local|remote) cr_uristemtolower convert the URI stem to lower case when reading log (on|off) exclamation−point delimited list of filter recnums which are applied to this log cs_flist source (uconf−driver) comma delimited list of filter names which are applied to this log source (uconf− import) cs_logformat logging format for logfile (auto|elf|elf2|ncsa|netscape|w3c) exclamation−point delimited list of profile recnums using this log source cs_rlist (uconf−driver) comma delimited list of profile names using this log source (uconf−import) ct_affiliation optional affiliation ct_loglocation local log pathname/location (e.g. /logs/access.log) ct_name identifier for record in logfile table password for ftp/http remote log access and UNC pathnames in Windows ct_password environments offset in hours from local time when using date matching patterns in the logfile ct_pathtimeoffset specification (e.g. +8) substitute GMT time for local time when using date matching patterns in the ct_pathtimegmt logfile specification (on|off) ct_port port number (e.g. 21 for ftp, 80 for http) ct_querytoken specify the query token separating the URI stem from the query (default: ? ) ct_remotelocation remote log pathname/location Chapter 8: Reference

220

ct_separator ct_server ct_username

single character field separator character (\s, \t are escaped characters for space and tab) fully qualified domain name or IP address of remote host/server for remote log downloads username to use for ftp/http remote log access and UNC pathnames in Windows environments (default: anonymous)

Directive List: profile table Directive cr_archivedata

Meaning/Purpose enable automatic ZIP archiving of older Urchin monthly databases (on|off) enable automatic rollback of Urchin databases after failed log processing cr_autorollback (on|off) cr_cleanbackups enable automatic removal of outdated Urchin database ZIP backups (on|off) enable automatic creation of Urchin database ZIP backups to allow rollback cr_createbackups functionality (on|off) specify whether ct_mimes list of pageview suffix/MIME types should be an cr_includemimes include or exclude list (exclude|include) specify whether ct_parameters list of URI query terms types should be an cr_includeparameters include or exclude list (exclude|include) cr_keeprawtrackingdata specify whether raw tracking data should be retained (on|off) cr_logtracking turn log tracking (on|off) cr_pgoalcasesensitive campaign primary goal match case sensitive (yes|no) cr_processpath turn visitor tracking (on|off) specify whether to keep visitor information between log processing runs cr_processvisitors (on|off) cr_profiletype profile type (Standard_Website|E−Commerce_Website) cr_sessionpageview session requires a pageview (on|off) cs_archivenmonths create monthly ZIP archives of Urchin databases after n months (default: 12) exclamation−point delimited list of filter recnums applied to this profile cs_flist (uconf−driver) comma delimited list of filter names applied to this profile (uconf−driver) exclamation−point delimited. list of group recnums granted access to this cs_glist profile (uconf−driver) comma delimited. list of group names granted access to this profile cs_keepnbackups specify number of ZIP backups to keep (0−10, default: 2) maximum number of database records to keep for any database table for this cs_limitdbtable profile (overrides cs_limitdbtable global value; default: 10000) exclamation−point delimited. list of log source recnums associated with this cs_llist profile (uconf−driver) comma delimited. list of log source names associated with this profile Chapter 8: Reference

221

cs_pathlevel cs_pgoalfield cs_referrallevel

cs_reportset

cs_sidfield cs_taskid cs_timeoffset cs_ulist

cs_vmethod cs_visitortimeout ct_affiliation ct_defaultpage ct_downloads ct_keywords ct_lasthit ct_mimes ct_name ct_parameters ct_pgoalexp ct_pgoalfield ct_reportdomains ct_sidpre ct_sidpost ct_utmdomain

Chapter 8: Reference

(uconf−import) depth of path reporting (default: 3) internal numeric id of field in ct_pgoalfield referral level to report (default: 3) report view template for this profile specified as one of the six built−in templates: Basic All|Basic Lite|Basic IT UTM−Enabled All|UTM−Enabled Nopaths|UTM−Enabled Webdesign or a User−Specified reporting template that matches a custom ".rs" reporting template file ID number for field where session ID is contained, from the Regular Field List reference table recnum for associated task in Task table (uconf−driver only) Time offset (in seconds) for data in log (default: 0=GMT) exclamation−point delimited list of user recnums granted access to this profile (uconf−driver) comma delimited list of user names granted access to this profile (uconf−import) visitor tracking (0=IP−UserAgent, 1=Session ID, 2=UTM, 3=IP−Only) session timeout in seconds (default: 3600) optional affiliation default page for site (e.g. index.html) comma separated list of download page suffix/MIME−types to match (default: dmg,doc,exe,gz,pdf,pkg,ppt,sh,tar,xls,zip) comma separated list of search engine referral keywords to match (default: general,key,kw,mt,p,q,qs,qt,query,search,search_string,text,word,words) time of most recent hit processed for this profile in seconds since 1970 [read−only, set by log processing engine] comma separated list of pageview suffixes/MIME types to match or exclude (default: css,cur,gif,ico,ida,jpeg,jpg,js,png) identifier for record in profile table comma separated list of URI query terms to include or exclude in the Page Query Terms report (default: sid) campaign primary goal expression to match field name to match expression in ct_pgoalexp against comma delimited list of site domains (e.g. urchin.com,www.urchin.com,quantified.net,www.quantified.net) text pattern that precedes session id pattern being matched text pattern that terminates the session id pattern being matched domain named to be used for UTM tracking (must match that set in __utm.js file in the document root of the website itself) 222

ct_website

URL for website associated with this profile (e.g. http://www.urchin.com)

Directive List: task table Directive

Meaning/Purpose start time of last run for this task in seconds since 1970 [read−only, set by log cd_btime processing engine] finish time of last run for this task in seconds since 1970 [read−only, set by log cd_etime processing engine] time of last initiation for this task in seconds since 1970 [read−only, set by log cd_lastrun processing engine] cd_nextrun time of next run for this task in seconds since 1970 cr_dow day of week to run task (0=Sun,1=Mon,2=Tue,3=Wed,4=Thu,5=Fri,6=Sat) cr_enabled [internal use only] cr_frequency task frequency (0=never,3=once,4=hourly,5=daily,6=weekly,7=monthly) cr_runnow [internal use only] cs_dom day of month to run task (monthly scheduling option) [1−31] cs_hour hour of day to run task [0−23] cs_minute minute of hour to run task [0−59] cs_rid recnum for associated profile in Profile table (uconf−driver only) ct_affiliation optional affiliation ct_application [internal use only] ct_completed percent of log processing completed [read−only, set by log processing engine] ct_day day of month to run task (run−once option)[1−31] ct_lockid [internal use only] ct_month month to run task (run−once option)[1−12] ct_pid [internal use only] ct_name identifier for record in task table current runtime status of task (0=processing logs,1=processing ct_runstatus DNS,2=completed,3=error,4=queued) [read−only, set by log processing engine] current scheduling status of task (0=disabled,1=not ct_status scheduled,2=scheduled,3=running,4=completed,5=error) [read−only, set by log processing engine] ct_year year for task to run (run−once option) [4−digit CCYY format] Directive List: affiliation table Directive

Meaning/Purpose pathname specification for top−level directory allowed for browsing for logs in ct_browselocation Log Source ct_cachedirectory Chapter 8: Reference

223

pathname specification of directory used to store temporary cache files used in display of reports for an affiliation ct_contact descriptive name for the affiliation's contact person ct_email email address for affiliation's contact person ct_name identifier for record in affiliation table pathname specification for top−level directory where Urchin reporting databases ct_reportdirectory will live for the affiliation Directive List: user table cr_changelanguage cr_changepassword cr_changeregion cr_leveltype

cs_adminlevel cs_glist

cs_language

cs_region

cs_rlist Chapter 8: Reference

user may change language preference (no|yes) user may change password (no|yes) user may change region preference (no|yes) affiliation admin privilege level 0=manage users/groups/tasks 1=manage users/groups/tasks/filters 2=manage users/groups/tasks/filters/log sources/profiles admin level (1=admin, 2=affiliate admin, 3=user) exclamation−delimited list of group recnums the user belongs to (uconf−driver) comma delimited list of group names the user belongs to (uconf−import) two−letter report language code for user en=English fr=French ge=German ja=Japanese sp=Spanish two−letter region code for user us=United States ch=China fr=France ge=Germany it=Italy ja=Japan ko=Korea po=Portugal sp=Spanish sw=Sweden uk=United Kingdom exclamation−point delimited list of profile recnums the user has access to (uconf−import) 224

cs_rslist ct_affiliation ct_fullname ct_name ct_password

comma−delimited list of profile names the user has access to (uconf−import) exclamation−point delimited set of "recnum|ReportSetName" pairs that optionally controls the report view for this user for a particular report (e.g. !79|Basic_All!83|Basic_Lite!) optional affiliation for user full name of user identifier for record in user table user password (automatically encrypted on input by uconf−driver or uconf−import)

Directive List: group table exclamation−point delimited list of profile recnums the group has access to (uconf−driver) comma−delimited list of profile names the group has access to (uconf−import) exclamation−point delimited set of "recnum|ReportSetName" pairs that optionally cs_rslist controls the report view for this group for a particular report (e.g. !79|Basic_All!83|Basic_Lite!) exlamation−point delimited list of user recnums assigned to the group cs_ulist (uconf−driver) comma−delimited list of user names assigned to the group (uconf−import) ct_affiliation optional affiliation ct_groupdesc description of the group ct_name identifier for record in group table cs_rlist

Error code list for failed FTP and HTTP remote webserver log transfers

Overview An Urchin Log Source can be configured to collect a webserver log from a remote server via FTP or HTTP. Under normal circumstances, the transfer will be successful and no errors appear in the runtime log. However, if some error is encountered during the transfer (e.g. an invalid username/password, remote server unreachable, remote log unreadable, etc.), Urchin will log an error code in the runtime output, as viewable in the Task History for the Profile. This error code appears in parenthesis next to the "failed" message after the webserver log transfer is attempted, e.g. (−9)

Chapter 8: Reference

225

The error codes are listed below along with a text message explaining the problem that was encountered. Error Code List 1

Unsupported protocol. This build support for this protocol.

2

Failed to initialize.

3

URL malformat. The syntax was not correct.

4

URL user malformatted. syntax was not correct.

5

Couldn't resolve proxy. The given proxy host not be resolved.

6

Couldn't resolve not resolved.

7

Failed to connect to host.

8

FTP weird server reply. The server sent couldn't parse.

9

FTP access denied. The server denied login.

10

FTP user/password incorrect. Either were not accepted by the server.

11

FTP weird PASS reply. Curl couldn't parse the reply sent to the PASS request.

12

FTP weird USER reply. Curl couldn't parse the reply sent to the USER request.

13

FTP weird PASV reply, Curl couldn't parse the reply sent to the PASV request.

14

FTP weird 227 format. Curl 227−line the server sent.

15

FTP can't get host. Couldn't resolve the host IP we got in the 227−line.

16

FTP can't reconnect. Couldn't connect to the host we got in the 227−line.

17

FTP couldn't set binary. Couldn't method to binary.

18

Partial fered.

19

FTP couldn't download/access the given RETR (or similar) command failed.

20

FTP

file.

write

host.

The

Only

of

curl

has

no

user−part of the URL

could

The given remote host was

data

curl

one or both

couldn't

change

parse the

transfer

a part of the file was trans−

file,

the

error. The transfer was reported bad by

Chapter 8: Reference

226

the server. 21

FTP quote error. A from the server.

quote

22

HTTP not found. The requested page was not found. This return code only appears if −−fail is used.

23

Write error. Curl couldn't write data filesystem or similar.

24

Malformat user. User name badly specified.

25

FTP couldn't STOR file. The server denied the STOR operation.

26

Read error. Various reading problems.

27

Out of memory. A memory allocation request

28

Operation timeout. The specified time−out period was reached according to the conditions.

29

FTP couldn't set unknown reply.

30

FTP PORT failed. The PORT command failed.

31

FTP couldn't use REST. The REST command failed.

32

FTP couldn't use SIZE. The SIZE command failed. The command is an extension to the original FTP spec RFC 959.

33

HTTP

34

HTTP post error. Internal error.

35

SSL connect error. The SSL handshaking failed.

36

FTP bad download resume. Couldn't continue an ear− lier aborted download.

37

FILE couldn't read file. Failed to open Permissions?

38

LDAP cannot bind. LDAP bind operation failed.

39

LDAP search failed.

40

Library

41

Function not found. A required not found.

42

Aborted by callback. abort the operation.

ASCII.

command

The

returned

to

server

error

a

local

failed.

returned

an

range error. The range "command" didn't work. post−request

generation

the

file.

not found. The LDAP library was not found.

Chapter 8: Reference

LDAP

function

was

An application told curl to

227

43

Internal error. A function was called parameter.

44

Internal order.

45

Interface error. A could not be used.

46

Bad password entered. An error was signaled when the password was entered.

47

Too many redirects. When following redirects, hit the maximum amount.

48

Unknown TELNET option specified.

49

Malformed telnet option.

51

The remote peer's SSL certificate wasn't ok

52

The server didn't reply considered an error.

53

SSL crypto engine not found

54

Cannot set SSL crypto engine as default

55

Failed sending network data

56

Failure in receiving network data

57

Share is in use (internal error)

58

Problem with the local certificate

59

Couldn't use specified SSL cipher

60

Problem with the CA cert (path? permission?)

61

Unrecognized transfer encoding

error.

Chapter 8: Reference

A

function

was

specified

with

a

bad

called in a bad

outgoing

interface

curl

anything, which here is

228