Table of Contents
Initial Install:
q q q q q q
Installation Installing Sawmill as a CGI Program Under IIS CGI-mode and The Temporary Folder Web Server Information Using Sawmill with WebSTAR V on MacOS X Getting Screen Dimensions and Depth Information
Upgrading:
q
User Interface:
q q q q q q q q
The Administrative Menu The Config Page Reports Customize Report Element Role Based Access Control (RBAC) Using the Sawmill Scheduler Users Language Modules--Localization and Text Customization
System Info:
q q q q q q q q q q
Sawmill Architecture Overview High Availability Best Practices Server Sizing Distributed Parsing Pathnames Hierarchies and Fields Cross-Referencing and Simultaneous Filters Regular Expressions Security File/Folder Permissions
Using Log Filters Using Date Filters Using Report Filters Using Table Filters
Operation:
q q
q q q q q q q q
Configuration Options All Configuration Options The Command Line Real Time Log Importing Configuration Files Setting up Multiple Users (ISP Setup) Creating and Editing Profiles by Hand Salang: The Sawmill Language
Databases:
q q q q q q q
Databases Database Detail SSQL: Sawmill Structured Query Language (SQL) ODBC and SQL configuration under Windows Querying the SQL Database Directly Memory, Disk, and Time Usage Distributed Parsing
Resources:
q q q q q
Log Files Supported Log Formats Creating Log Format Plug-ins (Custom Log Formats) Newsletters Troubleshooting
About:
q q
Credits Copyright
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Quickstart
This is the Sawmill New Users page. If you are new to Sawmill, this is where you start. This startup guide will walk you through your initial startup of Sawmill and help you create your first profile. Welcome to Sawmill Once you have downloaded and installed Sawmill, then you are ready to setup your username and passwords, create your profile, select your log file formats and view your first reports. The first step will be to start Sawmill, and follow the steps for accepting the end-user license agreement, and use it in trial mode, or enter a license key.
The Welcome screen is the first screen in the setup, with a brief description of Sawmill, your only option is to select the Next button.
The License agreement part of the setup allows you to review our license agreement and accept it or reject it. In order to continue with your setup, you will need to accept the terms. If you don't, then please contact our customer support. When you use Sawmill, you can use the product free for a 30 day trial period, or if you have a license key, you will enter it now.
You will be asked to enter a name and password for the Root Administrator, this is the main administrator for Sawmill, and this login will allow you to setup future usernames and passwords, and also a set of rules based on their roles. There is more about setting up roles and rules in the RBAC section.
When you select a trial version, you have the ability of using any version of Sawmill that you wish, in this instance, the Enterprise version has been selected.
Once you select your version of Sawmill, you have the option of signing up for our automated feedback agent. It will send non-specific demographic information to the development group. It only sends the log formats autodetected, platforms, the log formats selected and whether you were successful or not with your first database build.
Once you select or deselect the automated agent, then you ready to select the Finish button.
Then you will see the login screen, where you will enter your username and password.
Once you have logged in, then you will create your first profile, using the Profile Wizard.
Select Create New Profile and then there will be a new window that opens and you will step through the wizard.
In the profile wizard, you will select the location of your log source, along with the pathname to that source.
Once you have found your log files, select them and click the OK button on the top of the menu.
The log format will be detected and named by Sawmill and then you will have the choice of continuing with that format or choosing a different source.
The next step in the profile wizard is to select the fields you want in your reports. As shown, is the default setting, but you can select as many fields as you wish.
Once your profile is set, then you can decide what type of database server you will use, select the type from the drop down menu.
The final step is to name your profile and select the Finish button.
Once the profile is done, then you have the option of processing the data and viewing the reports. You can also view your profile, and do further customization, before viewing the reports. If you select the option to view the reports, Sawmill will build your database and show you the summary report.
The very first report is the Overview Report, it's a single page report, and it shows a summary of the fields that you have selected. You have created your first profile, and your first report. The User Guide will step through the menu items, and how you can create customized reports.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ
Sections of the FAQ
q q q q q q q
Licensing, Upgrading, and the Trial Version Major Features Installation and Setup Log Filters Reports Troubleshooting Miscellaneous
Major Features
Q: What platforms does Sawmill run on? A: Windows 95/98/ME/NT/2000/XP/2003, MacOS, most versions and variants of UNIX. For full details, see Available Platforms. Q: How much memory, CPU power, and disk space do I need to run Sawmill? A: At least 128 Meg RAM, 512 Meg preferred; 500 Meg disk space for an average database; and as much CPU power as you can get. For full details, see System Requirements. Q: What sorts of log files can Sawmill process? A: Sawmill can handle all major log formats and many minor formats, and you can create your own custom formats. For full details, see Supported Log Formats. Q: How is Sawmill different from other log analysis tools? A: Among other things, Sawmill does not generate static reports -- it generates dynamic, interlined reports. For full details, see Sawmill vs. The Competition. Q: How does a typical company use Sawmill; what does a typical Sawmill setup look like? A: Installations vary from customer to customer--Sawmill provides enough flexibility to let you choose the model that
works best for you. For full details, see Typical Usage Patterns. Q: How large of a log file can Sawmill process? A: There are no limits, except those imposed by the limitations of your server. For full details, see Processing Large Log Files. Q: How can I use a grid (cluster) of computers to process logs faster? A: Use an internal database, build a separate database on each computer, and merge them For full details, see Using a grid of computers to process more data. Q: Does the log data I feed to Sawmill need to be in chronological order? A: No; your log entries can be in any order. For full details, see Log Entry Ordering. Q: How can I create many profiles in a batch, from a template? A: Use the create_many_profilescommand-line option. For full details, see Creating many profiles in a batch.
A: Yes. For full details, see Downloading Log Data by FTP. Q: Can Sawmill use scp, or sftp, or ssh, or https, to download log data? Can it uncompress tar, or arc, or sea, or hqx, etc.? A: Not directly, but you can do it by using a command-line log source to run a command line, script, or program that does whatever is necessary to fetch the data, and prints it to Sawmill. For full details, see Using a Command-line Log Source. Q: Can I run Sawmill as a Service on Windows? Can I run Sawmill while I'm logged out? A: As of version 8, Sawmill is installed as a service when you run the normal installer. For full details, see Running Sawmill as a Service. Q: My web site is hosted in another state. Does Sawmill provide browser based admin tools I can use to configure it and retrieve reports? A: Yes, Sawmill's interface is entirely browser based. For full details, see Remote Administration. Q: Can Sawmill generate separate analyses for all the web sites hosted on my server? A: Yes, Sawmill includes a number of features for just this purpose. For full details, see Statistics for Multiple Sites. Q: Can Sawmill process ZIPped, gzipped, or bzipped log data? A: Yes, all three. For full details, see Processing zipped, gzipped, or bzipped Log Data. Q: Can Sawmill combine the logs from multiple clustered or load balanced web servers, so that the user has one view of the data? Can it report separately on the different servers? A: Yes. For full details, see Clustered Servers. Q: Can Sawmill be configured to limit access to statistics, so that a customer can only see the statistics associated with their section of my web site? A: Yes, you can password protect statistics in several ways. For full details, see Protecting Clients' Statistics. Q: I want to deploy Sawmill to my customers, but I want it to look like part of my site. I don't want the name Sawmill to appear -- I want my own name to appear. Can I relabel or white-label Sawmill? A: Yes, but the degree to which you can relabel depends on your license. For full details, see Relabeling/Whitelabeling Sawmill. Q: What features can I use in Sawmill's regular expressions? A: You can use whatever's documented (Regular Expressions), and possibly more. How much more you can use depends on your platform. For full details, see Regular Expression Features. Q: Are Sawmill's regular expressions case-sensitive? A: Yes. For full details, see Regular Expression Case-sensitivity. Q: How can I debug my custom log format, or my log filters? A: Build the database from the command line with the -v option: Sawmill.exe -p profilename -a bd -v egblpfd. For full details, see Using Debugging Output. Q: Sawmill doesn't work in CGI mode with SELinux enabled; how do I get it to work? A: Use semodule to allow the operations that Sawmill uses; see the long answer. For full details, see Configuring Sawmill to work with Security Enhanced Linux, in CGI mode. Q: How can I create a new profile, by copying an old one? A: Take an existing profile and change the first line to the new name. For full details, see How to Copy a Profile. Q: How can I rename a profile, after it has been created? A: Either recreate it with the new name, or edit the profile .cfg with a text editor, and change the label. For full details, see Renaming a profile.
Log Filters
Q: How can I exclude hits from my own IP address, or from my organization's domain? A: Add a Log Filter to exclude those hits. For full details, see Excluding an IP Address or Domain.
Q: How can I throw away all the spider hits, so I only see statistics on non-spider hits? A: Use a Log Filter to reject all hits from spiders (and worms). For full details, see Discarding hits from spiders. Q: Can Sawmill generate statistics on just one domain, from a log file containing log data from many domains? A: Yes. Add a log filter that rejects hits from all other domains. For full details, see Filtering All but One Domain. Q: How can I remove a particular file or directory from the statistics? A: Use a Log Filter to reject all hits on that file or directory. For full details, see Excluding a File or folder. Q: How can I group my events in broad categories (like "internal" vs. "external" or "monitoring" vs. "actual"), and see the events on each category separately, or see them combined? How can I create content groups? How can I include information from an external database in my reports, e.g. include the full names of users based on the logged username, or the full names of pages based on the logged URL? How can I extract parts of the URL and report them as separate fields? A: Create a new log field, database field, report, and report menu item to track and show the category or custom value, and then use a log filter to set the log field appropriately for each entry. For full details, see Creating Custom Fields. Q: How do I remove fields from the database to save space? A: Delete the database.fields entry from the profile .cfg file, and delete any xref groups and reports that use it. For full details, see Removing Database Fields. Q: Most of the referrers listed in the "Top referrers" view are from my own site. Why is that, and how can I eliminate referrers from my own site from the statistics? A: These are "internal referrers"; they represent visitors going from one page of your site to another page of your site. You can eliminate them by modifying the default "(internal referrer)" log filter, changing http://www.mydomain.com/ in that filter to your web site URL. For full details, see Eliminating Internal Referrers. Q: I use parameters on my pages (e.g. index.html?param1+param2), but Sawmill just shows "index.html? (parameters)." How can I see my page parameters? A: Delete the Log Filter that converts the parameters to "(parameters)." For full details, see Page Parameters. Q: How can I see just the most recent day/week/month of statistics? A: Use the Calendar, or the Filters, or use a recentdaysfilter on the command line. For full details, see Recent Statistics. Q: How can I combine referrers, so hits from http://search.yahoo.com, http://dir.yahoo.com, and http://google.yahoo. com are combined into a single entry? A: Create a log filter converting all the hostnames to the same hostname. For full details, see Combining Referring Domains. Q: How can I debug my custom log format, or my log filters? A: Build the database from the command line with the -v option: Sawmill.exe -p profilename -a bd -v egblpfd. For full details, see Using Debugging Output. Q: When I look at the top hosts and top domains, all I see are numbers (IP addresses). How do I get the domain information? A: Turn on reverse DNS lookup in the Network options (or in your web server), or use Sawmill's "look up IP numbers using DNS" feature. For full details, see Resolving IP Numbers. Q: Can I configure Sawmill to recognize search engines other than the ones it knows already? A: Yes -- just edit the search_engines.cfg file in the LogAnalysisInfo folder with a text editor. For full details, see Adding Search Engines. Q: My server logs times in GMT, but I'm in a different time zone. How can I get the statistics in my own time zone? A: Set the date_offset option in the profile. For full details, see Changing the Time Zone in Statistics.
Reports
Q: What are "hits"? What are "page views"? What is "bandwidth"? What are "visitors"? What are "sessions"? A: Hits are accesses to the server; page views are accesses to HTML pages; visitors are unique visitors to the site,
and sessions are visits to the site. For full details, see Hits, Visitors, etc.. Q: My web site uses dynamic URLs instead of static pages; i.e. I have lots of machine-generated URLs that look like / file?param1=value1¶m2=value2.... Can Sawmill report on those? A: Yes, but you need to delete the "(parameters)" log filter first. For full details, see Dynamic URLs. Q: There's a line above some of the tables in the statistics that says, "parenthesized items omitted." What does that mean? A: It means that some items (probably useless ones) have been omitted from the table to make the information more useful--you can show them by choosing "show parenthesized items" from the Options menu. For full details, see Parenthesized Items Omitted. Q: In my reports, I see entries for /somedir/, and /somedir/{default}, and /somedir, and /somedir/ (default page). What's the difference? I seem to have two hits for each hit because of this; one on /somedir and then one on /somedir/; what can I do to show that as one hit? A: /somedir/ is the total hits on a directory and all its contents; /somedir is an attempt to hit that directory which was directed because it did not have the trailing slash; and the default page ones both indicate the number of hits on the directory itself (e.g., on the default page of the directory). For full details, see Default Page Hits. Q: How do I see the number of downloads for a particular file (i.e. a newsletter PDF, or a template file PDF)? A: Select PDF from the 'File Types' table and then use the Zoom Menu to Zoom to the URL's report, then Select the PDF you need to get an overview of that file. For full details, see Zooming on single files. Q: How do I see more levels of statistics (i.e. how can I zoom in further)? A: Increase the "suppress below" level for this database field in the profile options. For full details, see Zooming Further. Q: Can I see the number of hits per week? Can I see a "top weeks" report? A: Yes, by using the Calendar, and/or creating a database field and a report tracking "weeks of the year." For full details, see Weekly Statistics. Q: Can Sawmill count unique visitors? A: Yes, using unique hostname or using cookies. For full details, see Unique Visitors. Q: Can Sawmill count visitors using cookies, rather than unique hostnames? A: Yes -- it includes a built-in log format to do this for Apache, and other servers can be set up manually. For full details, see Counting Visitors With Cookies. Q: Sawmill shows IP addresses, or hostnames, in the Sessions reports, but I want it to show usernames instead. How can I do that? A: Edit the profile .cfg, and change sessions_visitor_id_field to the username field. For full details, see Tracking Sessions with Usernames instead of IPs. Q: Can Sawmill show me the paths visitors took through my web site? A: Yes; its "session paths (clickstreams)" report is very powerful. For full details, see Clickstreams (Paths Through the Site). Q: I want to track conversions-- i.e. I want to know which of my ads are actually resulting in sales. Can Sawmill do that? A: Yes -- encode source information in your URLs and use global filters to show the top entry pages for your "success" page. For full details, see Tracking Conversions. Q: How can I see the top (insert field here) for each (insert field here)? For instance, how can I see the pages hit by particular visitor? Or the top visitors who hit a particular page? Or the top referrers for a particular day, or the top days for a particular referrer? Or the top search phrases for a search engine, the top authenticated users for a directory, the top directories accessed by an authenticated user, etc.? A: Click on the item you're interested in, and chose the other field from "default report on zoom". For full details, see Finding the Top (field) for a Particular (field). Q: How can I see only the visitors that entered at a particular page, or only the visitors that hit a particular page at some point in their session? A: Use the global filters to show only sessions containing that page; reports will only show sessions including that page. For full details, see Sessions For A Particular Page.
Q: How can I see only the visitors that came from a particular search engine? A: Direct that search engine to a particular entry page, and then use global filters to show only sessions for that page. For full details, see Sessions For A Particular Search Engine. Q: Why doesn't the number of visitors in the Overview match the number of session users in the "Sessions Overview" report? A: Session information only shows users contributing page views, and other views show all visitors. Also, long sessions are discarded from the session information. For full details, see Visitors vs. Session Users. Q: How can I see just the most recent day/week/month of statistics? A: Use the Calendar, or the Filters, or use a recentdaysfilter on the command line. For full details, see Recent Statistics. Q: Can I export the data from Sawmill reports to Excel or other programs? A: Yes; click the "export" link in the toolbar above reports to export the data from that report's table in CSV format. Many programs, including Excel, can import CSV format files. For full details, see Exporting Data From Statistics. Q: I've heard that statistics like visitors, "sessions," and "paths through the site" can't be computed accurately. Is that true? Are the statistics reported by Sawmill an accurate description of the actual traffic on my site? A: Sawmill accurately reports the data as it appears in the log file. However, many factors skew the data in the log file. The statistics are still useful, and the skew can be minimized through server configuration. For full details, see Are the Statistics Accurate?. Q: How does Sawmill compute session information, like total sessions, repeat visitors, paths through the site, entry pages, exit pages, time spent per page, etc.? A: Sawmill uses the visitor id field to identify unique visitors. It decides that a new session has begun if a visitor has been idle for 30 minutes. It rejects sessions longer than 2 hours. For full details, see Session Computation. Q: How do I change the field which is graphed, e.g. from page view to bandwidth? A: Edit the profile .cfg file, and change the field name in the numerical_fields section of that report element. For full details, see Changing the graph field. Q: Does Sawmill do "peak period" reports (by weekday, or hour)? A: Yes. For full details, see Peak Period Reports. Q: Does Sawmill do time of day? A: Yes. For full details, see Time of Day Statistics. Q: How can I tell where visitors went when they left the site? A: Normally, you can't. However, you can set up "reflector" pages if you need this information. For full details, see Tracking Exit URLs. Q: How can I see allfiles that were hit on my web site, not just the pages? A: Delete or disable the 'Strip non-page-views' log filter, and rebuild the database For full details, see Showing All Files. Q: Why do I see hits on a file called "robots.txt" in my statistics? A: robots.txt is a file that tells search engine spiders and robots what they can do, so a hit on robots.txt means that a spider visited your site. For full details, see robots.txt. Q: Why do I see a hits on a file called "favicon.ico" in my statistics? A: favicon.ico is a special icon file that Internet Explorer looks for when it first visits the site. For full details, see favicon.ico. Q: How can I add additional columns to report tables, e.g. to add a single report which reports source IP, destination IP, source port, and destination port? A: Edit the report in the profile .cfg file to add a new item to the columns group. For full details, see Adding columns to report tables. Q: Does Sawmill produce reports for HIPAA and Sarbanes-Oxley (SOX) compliance? A: Yes, run the Single-Page Summary report. For full details, see Support for HIPAA and Sarbanes-Oxley
Compliance. Q: Some of the IP addresses in my data are not resolved properly to country/region/city by Sawmill. I know that Sawmill uses the MaxMind GeoIP database, and when I go to the MaxMind site, their demo resolves these IPs properly. Why isn't Sawmill doing the same as the online GeoIP demo? A: Sawmill uses the GeoLite City database, a less accurate (and less expensive) version of the GeoIP City database. To get full accuracy, buy GeoIP City from MaxMind. For full details, see GeoIP database in Sawmill is not as accurate as the one on the Maxmind site. Q: When I export CSV, durations appear as numbers, which Excel doesn't understand. How can I format durations to work with Excel? A: Add an extra column to the spreadsheet to convert them to fractional days; or use a custom database field in the report element. For full details, see Formatting Durations for Excel. Q: When I'm saving a report for the first time but what about my filters? A: If you have no filters active, then they will not be saved with your report. For full details, see Saving filters during Save as New Report.
Troubleshooting
Q: When I run Sawmill, it tells me that the server is started (it shows me the URL), but when I try to access that URL, the browser says it's not available. How can I fix this? A: You may be using a proxy server which prevents you from accessing a server running on your own machine. Try reconfiguring the proxy to allow it, or try running Sawmill on IP 127.0.0.1 (the loopback interface). For full details, see Can't Access the Server. Q: On Windows 2003, I can't access the Sawmill server using Internet Explorer. Why not? A: The "Internet Explorer Enhanced Security Configuration" may be enabled, blocking access; uninstall it or add 127.0.0.1:8988 to the trusted sites. For full details, see Can't access server with Windows 2003 and IE. Q: When I try to log in to Sawmill, I get to the Admin page, but the next thing I click takes me back to the login page. Why? A: Your browser isn't storing the cookie Sawmill needs to maintain the login, or something is blocking the browser from sending the cookie. Make sure cookies are on in the browser, firewalls aren't blocking cookies, and don't use Safari 1.2.1 or earlier as your browser. For full details, see Login Loops Back to Login. Q: Why can't Sawmill see my mapped drive, share, directory, or mount points when I run it as a Windows Service? A: The Service must run with the same privileged user account that has the mapped drive, share, directory, or mount point privilege. For full details, see Can't See Network Drives with Sawmill as Service. Q: Why can't Sawmill see my mapped drive, share, directory, or mount points when I run it under Windows 2003? A: Windows 2003 has a strict security policy which prevents access to network drives from Sawmill. To make it work, you need to let %22everyone%22 permissions apply to anonymous, and remove the restriction on anonymous access to named pipes and shares (in Administrative Tools). For full details, see Can't See Network Drives in Windows 2003. Q: When I build or update my database with Sawmill, it uses a huge amount of memory. Then, when I view statistics, it's very slow. What can I do about that? A: Decrease the complexity of the database. For full details, see Sawmill uses too much memory for builds/ updates, and is slow to view. Q: I get an error 'Unable to allocate Nbytes of memory' while building a database, and Sawmill seems to have used all my available memory. What can I do about it? A: Use a MySQL database, and/or use a 64-bit computer and operating system, and/or simplify your database For full details, see Database Memory Usage. Q: When I try to build a database, or view reports, I get an error, "The total number of locks exceeds the lock table size". How can I fix this? A: Increase the innodb_buffer_pool_sizein my.cnf (my.ini), to 256M. For full details, see Error with MySQL: "The total number of locks exceeds the lock table size".
Q: I can't access Sawmill where I usually do (http://www.xxx.yyy.zzz:8988/) -- is your (Flowerfire's) server down? A: No -- yourserver is down. Sawmill runs on your computer, not on ours -- contact your network administrator if you're having problems accessing it. For full details, see Sawmill Server is Down. Q: Sawmill displays the following error: "The background process terminated unexpectedly, without returning a result." What does that mean, and how can I fix it? A: Sawmill has probably crashed, so this could be a bug in Sawmill. See the long answer for suggestions. For full details, see The background process terminated unexpectedly. Q: When I run Sawmill on Windows, I get an error: "A required DLL is missing: WS2_32.DLL." What's going on? A: You need Winsock 2. For full details, see Winsock 2. Q: When I run Sawmill on Windows 98, I get an error: "A required DLL is missing: OLEACC.DLL." What's going on? A: You need to download and install the latest Service Packfor Windows 98. For full details, see Missing DLL: OLEACC.DLL. Q: When I run Sawmill on Windows, I get an error: "A required DLL is missing: URLMON.DLL." What's going on? A: Install the latest Internet Explorer, and the problem should go away. For full details, see Missing DLL: URLMON. DLL. Q: When I run Sawmill, I get an error: './sawmill: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory'. What's going on? A: Sawmill requires the libstdc++ library. This is available by default on many platforms, and is included in the Sawmill distribution on others (including Solaris) For full details, see libstdc++ missing. Q: When I try to run Sawmill, I get an error "relocation error: sawmill: undefined symbol: __dynamic_cast_2". How can I fix this? A: This is a GNU library incompatibility; build Sawmill from source instead of using the binary distribution. For full details, see Relocation error: __dynamic_cast_2. Q: Sawmill only shows me the IP addresses of my visitors, even when I turn on DNS lookup. Why? A: Try deleting the IPNumbersCache file in LogAnalysisInfo -- see the long answer for other solutions. For full details, see Problems With DNS Lookup. Q: I run Sawmill in CGI mode, and all the images in the menus and the reports are missing or broken. Why? A: You may have set the "temporary folder" incorrectly during installation. Try deleting the preferences.cfg file in LogAnalysisInfo, and access Sawmill to try again. For full details, see No Images in CGI Mode. Q: The statistics show the wrong years -- when I analyze data from previous years, it appears as this year, or data from this year appears in last year. Why? A: Your log format does not include year information, so Sawmill has to guess the year. Use a different log format if possible (one which includes year information). See the long answer for a way of manually setting the year for blocks of log data. For full details, see Years are wrong in the statistics. Q: When I run Sawmill as a CGI program under IIS, I get an error message "CGI Timeout: The specified CGI application exceeded the allowed time for processing. The server has deleted the process." What can I do about that? A: Set the IIS CGI timeout to a high value, like 999999. For full details, see IIS CGI Timeout. Q: I've forgotten the password I chose for Sawmill when I first installed; how can I reset it? A: As of version 8.0.2, there is a custom action reset_root_admin. For full details, see Resetting the Administrative Password. Q: When I run Sawmill as a CGI, it runs as a special user (nobody, web, apache, etc.). Then when I want to use Sawmill from the command line or in web server mode, the permissions don't allow it. What can I do about this? A: Loosen the permissions in the Preferences, or run your CGI programs as a different user, or run your command line programs as the CGI user. For full details, see CGI User Permissions. Q: How much memory/disk space/time does Sawmill use? A: It depends on how much detail you ask for in the database. It uses very little if you use the default detail levels. For full details, see Resource Usage. Q: When I add up the number of visitors on each day of the month, and I compare it to the total visitors for the month,
they're not equal. Why not? Also, why doesn't the sum of visitors on subpages/subdirectories add up to the total for the directory, and why doesn't the sum of visitors on subdomains add up to the total for the domain, etc.? Why are there dashes (-) for the visitor totals? A: Because "visitors" is the number of uniquevisitors, a visitor who visits every day will show up as a single visitor in each day's visitors count, but also as a single visitor for the whole month -- not 30 visitors! Therefore, simple summation of visitor numbers gives meaningless results. For full details, see Visitor Totals Don't Add Up. Q: When I look at my statistics, I see that some days are missing. I know I had traffic on those days. Why aren't they shown? A: Your ISP may be regularly deleting or rotating your log data. Ask them to leave all your log data, or rotate it over a longer interval. It's also possible that your log data does not contain those days for another reason. For full details, see Days Are Missing from the Log Data. Q: My log data contains referrer information, but I don't see referrer reports, or search engines, or search phrases. Why not? A: Sawmill includes referrer reports if the beginningof the log data includes referrers. If your log data starts without referrers, and adds it later, you won't see referrer reports. Create a new profile from the latest log file (with referrers), and change the log source to include all log data. For full details, see Referrer Reports Missing. Q: When I process log data with Sawmill, it uses most or all of my processor; it says it's using 90%, or even 100% of the CPU. Should it be doing that? Is that a problem? A: Yes, it should do that, and it's not usually a problem. Any CPU-intensive program will do the same. However, you can throttle it back if you need to, using operating system priorities. For full details, see Sawmill Uses Too High a Percentage of CPU. Q: How do I build a database from the command line? A: Run "executable-p profilename-a bd" from the command line window of your operating system. For full details, see Building a Database from the Command Line. Q: Sawmill does not recognize my Symantex SGS/SEF log data, because it is binary. How can I export this data to a text format so Sawmill can process it? A: Use flatten8, or remorelog8 For full details, see Exporting Symantec SGS/SEF data to text format. Q: How can I track full URLs, or HTTP domains, or resolved hostnames, when analyzing PIX log data? A: You can't track full URLs or HTTP domains, because PIX doesn't log them; but you can turn on DNS lookup in the PIX or in Sawmill to report resolved hostnames. For full details, see Tracking URLs in Cisco PIX log format. Q: I installed Sawmill on a 64-bit (x64) Mac, and now it says, "This profile uses a MySQL database, but MySQL is not enabled in this build." Why? A: MySQL does not currently work on x64 MacOS. For full details, see MySQL and x64 MacOS. Q: How do I backup and restore my Sawmill installation, or a particular profile and its database? A: Backup and restore the LogAnalysisInfo folder when no update or build is running, or for one profile. For MySQL also backup and restore the MySQL database. For full details, see Backup and Restore. Q: On Windows, I sometimes get "permission denied" errors, or "volume externally altered" errors, or "file does not exist" error when building a database. But sometimes, it works. What can cause this sort of sporadic file error? A: An anti-virus or anti-malware software, which is actively scanning your Sawmill installation folder, can cause this. Disable scanning of Sawmill's data folders, in the anti-virus product. For full details, see Permission Denied Errors. Q: Will my plug-in work with version 8? A: Most version 7 plug-ins will work with version 8. For full details, see Using version 7 plug-ins. Q: Why do my emailed reports from Outlook 2003 not line up, everything is out of alignment? A: Change the settings in Outlook to not load automatically. For full details, see Emailed Reports in Outlook 2003.
Miscellaneous
Q: Where did the name "Sawmill" come from? A: A sawmill is a tool that processes logs, and so is Sawmill. For full details, see The Name "Sawmill". Q: Why are new versions of Sawmill released so often? Is it buggy? Do I need to download every new version?
A: We ship new versions to provide our customers with the latest minor features and bug fixes quickly. Sawmill is no buggier than any other software, and you don't need to download a new release unless you're having problems with the current one. For full details, see Frequent New Versions of Sawmill.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
An Overview of Sawmill:
q q
Reports:
q
q q q
The Report Header r Profiles r Logout link The Report Toolbar r The Date Picker Selector r Macros r The Date Range Selector r Filters r Printer Friendly r Miscellaneous The Report Menu r The Calendar r The Overview r The Date/Time Report r The Hit Types Report r The Content Report r The Demographics Report r The Visitor Systems Report r Referrers r Other Reports r The Session Overview r The Session Views r Single-Page Summary Report r Log Detail Report The Report Bar The Report Graph The Report Table r Row Numbers r Export r Customize
Filters:
q q q
What does Sawmill measure? Where did my visitors come from? How Sawmill counts Visitors How Sawmill calculates Sessions How Sawmill calculates Durations Applying what you have learned to your web site
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Installation
Web Server Mode or CGI Mode?
Before you install Sawmill you will need to decide which mode you want to run it in. Sawmill will run as a stand-alone in web server mode or in CGI mode. If you plan to run Sawmill in web server mode, please see the section below called Web Server Mode Installation. If you plan to run it as a CGI program, under an existing web server, please see the section below, called CGI Mode Installation. If you don't know which way you want to install it, this section will help you with that decision. In web server mode, Sawmill runs its own web server, and serves statistics using it. In CGI mode, Sawmill runs as a CGI program under another web server. In both modes, you access Sawmill through a web browser on your desktop. In brief, web server mode is ideal if you want installation to be as easy as possible. CGI mode is the better choice if you want to run Sawmill on a shared web server system, or a system where you have limited access. Here are the specific details of the advantages and disadvantages of web server mode vs. CGI mode: The advantages of Web Server Mode:
q
Sawmill uses its own web server in this mode-- there does not need to be an already existing web server on the computer where it is running. Sawmill is extremely simple to install in web server mode. You just run it, point your browser at it, choose a password, and you're ready to start using it. In CGI mode, the installation and startup process is considerably more involved, though still fairly easy. The Sawmill Scheduler is available in web server mode. In CGI mode, the Scheduler is not easily available; an external scheduler must be used to schedule database builds/updates/etc.
Sawmill only uses memory and other resources while it's actively in use in CGI mode. In web server mode, Sawmill uses memory even when it isn't being actively used. At system boot time there is no extra configuration required to start Sawmill -- it is always available. In CGI mode, Sawmill can use the services of the web server that's running it. This makes it possible to use HTTPS, server authentication, and other powerful server features with Sawmill. In some environments, web server mode may not be possible or permissible, due to restrictions of the server, firewall limitations, and other considerations. For instance, if you have only FTP access to your web server, and you want to run Sawmill on the server, you must use CGI mode.
Please continue your installation with either Web Server Mode Installation or CGI Mode Installation depending upon your choice.
Windows: Sawmill is a standard Windows installer. Just double-click the program to start the installer, and follow the instructions.
Once Sawmill is installed, it will be running as a Windows service. You can access it at http://127.0.0.1:8988/ with a web browser. Sawmill runs as the SYSTEM user by default, which is the most secure approach, but restricts access to network shares or mapped drives. See Can't See Network Drives with Sawmill as Service for instructions for running Sawmill as a different user, to get access to that user's mapped drives and network privileges.
q
MacOS: Sawmill is a disk image. Mount the image, and drag the Sawmill folder to the Applications folder. Once Sawmill is installed, you can start using it by double-clicking the Sawmill application icon (in the Applications/Sawmill folder). Once it's running, click Use Sawmill to start using it. UNIX: Sawmill is a gzipped tar archive file. You need to transfer that to the UNIX machine where you'll be running Sawmill, if it's not already there. Then you'll need to open a "shell" prompt using telnet, ssh, or the way you normally get to the UNIX command line. Next, gunzip and untar the file using the following command: gunzip -c (sawmill.tgz) | tar xf You will need to change (sawmill.tgz) to match the name of the file you have downloaded. Once the archive is uncompressed and extracted, you can run Sawmill by changing to the installation directory, and typing the name of the executable file from the command line: cd (installation-directory) ./sawmill You may need to change the filename to match the actual version you downloaded. Sawmill will start running, and it will start its own web server on the UNIX machine (using port 8988, so it won't conflict with any web server you may already be running there). To start using Sawmill, copy the URL printed by Sawmill in your window, and paste it into the URL field of your web browser, and press return. You should see Sawmill appear in your web browser window. Note: You can add a single ampersand (&) to the end of the command line that starts Sawmill, to run Sawmill "in the background," which allows you to close your terminal window without killing Sawmill. On some systems, you may also need to add nohup to the beginning of the command line for this to work properly.
If you have any problems installing Sawmill in web server mode, please see the Troubleshooting Web Server Mode section.
UNIX or MacOS with SSH access. If your server is UNIX or similar, and you have SSH access to your server, then download the file to your server and gunzip/untar it according to the instructions above (in the web server mode installation section). Then copy the executable file and LogAnalysisInfo directory from the installation directory to your cgi-bin directory, using this command: cp (installation-directory)/sawmill (cgi-bin)/sawmill.cgi
cp -r (installation-directory)/LogAnalysisInfo (cgi-bin) You may need to change the name of sawmill to match the version you downloaded, and you will definitely need to change the (cgi-bin) part to match your web server's cgi-bin directory. You can access Sawmill now using the URL http://(yourserver)/cgi-bin/sawmill.cgi replacing (yourserver) with the actual name of your server. Sawmill should appear in the web browser window. Make the Sawmill executable file and LogAnalysisInfo directory accessible by the CGI user. The CGI user depends on the configuration of the web server, but it is often a user like "web" or "apache" or "www". If you have root access you can use this command, after cd'ing to the cgi-bin directory, to change ownership of the files: chown -R apache (PRODUCT_EXECUTABLE_DOCS).cgi LogAnalysisInfo If you do not have root access, you may need to open up permissions completely to allow the root user to access this: chmod -R 777 (PRODUCT_EXECUTABLE_DOCS).cgi LogAnalysisInfo However, please note that using chown 777 is much less secure than using chown--anyone logged on to the server will be able to see or edit your Sawmill installation, so in a shared server environment, this is generally not safe. If possible, use chown as root instead. For more information, see Troubleshooting CGI Mode below.
q
UNIX with only FTP access. If you have a UNIX server, and you have only FTP access to your server (you cannot log in and run commands by ssh, or in some other way), then you need to do things a bit differently. Here's how you do it: 1. Download the file to your desktop system, remember you need the version that matches your server, not the version that matches your desktop system. Download in BINARY mode. 2. Use a gunzipping and untarring utility on your desktop system to decompress the file, WinZip on Windows, StuffIt Expander on Mac, or gunzip/tar if your desktop is also UNIX. 3. Rename the sawmill file to sawmill.cgi. 4. Upload sawmill.cgi to your server's cgi-bin directory. Make sure you use BINARY mode to do the transfer, otherwise it won't work. 5. Upload the entire LogAnalysisInfo directory including all the files and directories in it, to your server's cgi-bin directory. In order to do this conveniently, you will need to use an FTP client which supports recursive uploads of directories, including all subdirectories and files. Make sure you use BINARY mode to do the transfer, or it won't work. 6. Make sawmill.cgi executable in your web server using "chmod 555 sawmill.cgi" if you're using a command-line FTP program, or using your FTP program's permission-setting feature otherwise. You can now access Sawmill using the URL http://(yourserver)/cgi-bin/sawmill.cgi replacing (yourserver) with the actual name of your server. Sawmill should appear in the web browser window. For more information, see the section on Troubleshooting CGI Mode below.
Windows. If the web server is IIS, this is a difficult installation due to the security features of IIS, which make it difficult to run a binary CGI program -- consider using web server mode instead. If you need CGI mode, then it is possible, but it's not as easy as on other platforms. See Installing Sawmill as a CGI Program Under IIS for more information on installing in CGI mode under IIS. You need to upload the Sawmill.exe file and the LogAnalysisInfo folder to your server's cgi-bin directory (Windows may hide the .exe part of the filename, but that is its actual full filename). The easiest way to get this file and this folder is to install the Windows version of Sawmill on a local Windows desktop machine, and then look in the Sawmill installation
directory (C:\Program Files$PRODUCT_NAME\ by default); the Sawmill.exe file and LogAnalysisInfo folder will be there. If you don't have access to a Windows machine locally, please contact support@flowerfire.com and we will send you this file and folder. Make sure you do the upload in BINARY mode, or Sawmill will not work! Once you've uploaded it to your cgi-bin directory, you can access it using the URL http://(yourserver)/cgi-bin/Sawmill.exe (replace yourserver with the name of your domain). Sawmill should appear in the web browser window. If it still doesn't work, see Troubleshooting CGI Mode below.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Installation Procedures:
Sawmill Directories:
You would need the following Directories: {BASE}\inetpub\wwwroot\ ==> your website {BASE}\inetpub\ sawmill {BASE}\inetpub\cgi-bin\ {BASE}\inetpub\cgi-bin\LogAnalysisInfo (sawmill creates this automatically)
Initially give both cgi-bin and sawmill FULL permission. IIS Console setup:
Created Virtual Directories (NT4 calls them web shares as well) for CGI-BIN & Sawmill in IIS Management Console;
Both Virtual Directories are given Execution, and write, Rights (FULL permission) Make sure the "Index this directory" is checked OFF (after the installation is completed we will come back and change this to a more secure setting) Keep in mind the \cgi-bin and and sawmill directories, are in fact virtual directories under your website, and are not physically under your "website" directory. Execution and Sawmill Installation: Once we have all the Subdirectories and Virtual directories in place, then: - Copy "Sawmill.exe" to {BASE}\inetpub\cgi-bin\ directory. - Execute (run) "Sawmill.exe" - Following the Sawmill Installation procedures (see Installation) - Establish passwords and Temp Directory {BASE}\inetpub{=PRODUCT_EXECUTABLE_DOCS=}
- Create your first "configuration", and add the Log Files, and look at the "Statistics" - Enjoy your hard work ;-)
I took away FULL permission from the cgi-bin\ directory, and gave it Read/Execute ONLY. Note: When you make the change here, make sure the "Replace Permission on Subdirectories" is checked OFF {BASE}\inetpub\cgi-bin\LogAnalysisInfo : (File Manager) Here, Sawmill still needs to create directories for all additional websites, or if there is any changes to the "configuration". However, there is no need to Execute and scripts here. So give Read/Write/Delete Permission Note: When you make the change here, make sure the "Replace Permission on Subdirectories" is checked ON {BASE}\inetpub{=PRODUCT_EXECUTABLE_DOCS=} : (File Manager)
I took away FULL permission from the sawmill\ directory, and gave it Read/Write/Delete permission, (no Execution) Note: When you make the change here, make sure the "Replace Permission on Subdirectories" is checked ON
I took away FULL permission from the cgi-bin\ virtual directory, and gave it Read/Execute permission. Note: Make sure the "Index this directory" is checked OFF
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
must be within the HTML pages folder of the web server which is serving Sawmill. Web servers serve all their pages from a particular folder on the hard drive; the Temporary folder must be somewhere within this folder. This folder is called different things by different web servers, but some common names are %22htdocs%22, %22html%22, %22data %22, %22wwwroot%22, and %22Web Pages.%22 If you do not know where this folder is for your web server, you can find out in your web server's documentation. See Web Server Information for a list of web servers, and where the HTML pages folder is for them. must either exist and be writable by Sawmill (running as a CGI program), or it must be possible for Sawmill to create it. UNIX users may need to use the chmod command to set permissions correctly.
The Temporary folder should be described using your platform's standard pathname format (see Pathnames). The Temporary folder URL should describe the same folder as the Temporary folder, but as a URL (as it might be accessed using a web browser, by browsing the server Sawmill is running under). The following examples illustrate how it might be set for various platforms; see Web Server Information for more specific information. As the hostname part of the URL, you will need to specify the machine Sawmill is running on. In the examples below, this is chosen to be www.mysys.com; you will need to replace this part of the URL with your machine's actual hostname. Example 1: For MacOS, if the root of your web server's HTML pages is at /Library/WebServer/Documents (i.e. the Documents folder, which is in the WebServer folder, which is in the Library folder), and this is accessible from your web browser as http://www.mysys.com/, then you could enter /Library/WebServer/Documents/sawmill/ as the Temporary folder and http://www.mydomain.com/sawmill/ as the Temporary folder URL. Example 2: For Windows, if the root of your web server is at C:\inetpub\wwwroot\ (i.e. the wwwroot folder, which is in the inetpub folder, which is on the C drive), and this is accessible from your web browser as http://www.mysys.com/, then you could enter C:\\inetpub\\wwwroot\\ sawmill\\ as the Temporary folder and http://www.mydomain.com/sawmill/ as the Temporary folder URL. Example 3: For UNIX, if the root of your web server is at /home/httpd/html/, and this is accessible from your web browser as http://www.mydomain.com/, then you could enter /home/httpd/html/sawmill/ as the Temporary folder and http://www.mysys. com/sawmill/ as the Temporary folder URL. It is also possible to use relative pathnames, which sometimes makes it easier; e.g. on UNIX if both the cgi-bin directory and the html directory are in the same directory, you can use ../html/sawmill as the temporary directory, without having to specify the full pathname. See Web Server Information for more information.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Information CGI folder: Web pages root: Temporary folder: Temporary folder URL: C:\Program Files\Xitami\cgi-bin\ C:\Program Files\Xitami\webpages\ C:\Program Files\Xitami\webpages{=PRODUCT_EXECUTABLE_DOCS=}\ http://www.myhost.com/sawmill/ C:\httpd\cgi-bin\ C:\httpd\htdocs\ C:\httpd\htdocs{=PRODUCT_EXECUTABLE_DOCS=}\
OmniHTTP (Windows)
Temporary folder URL: http://www.myhost.com/sawmill/ Microsoft Peer Web Services (Windows) CGI folder: Web pages root: Temporary folder: C:\inetpub\cgi-bin\ C:\inetpub\wwwroot\ C:\inetpub\wwwroot{=PRODUCT_EXECUTABLE_DOCS=}\
Temporary folder URL: http://www.myhost.com/sawmill/ Microsoft IIS 3 (Windows) Microsoft IIS 4/5 (Windows) Sawmill and IIS version 3 do not get along! Please upgrade to IIS 4, which works great.
CGI folder: Web pages root: Temporary folder: Log File Location:
CGI folder: Web pages root: Temporary folder: Log File Location: Notes:
apache (UNIX)
CGI folder: Web pages root: Temporary folder: Log File Location: Notes:
/home/httpd/cgi-bin/ /home/httpd/html/ /home/httpd/html/sawmill/ varies; try /var/log/httpd/access_log The Web pages root varies from installation to installation; it may be somewhere else. Consult your apache profile files (often in /etc/httpd/ conf/, but this varies too) for the exact location (search for DocumentRoot).
If you're running Sawmill on some other server, and need help configuring it, please contact us at support@flowerfire.com. If you're using some other server, and have it working, we'd really appreciate it if you could forward the above information for your server to us, so we can add that server to the list, and save time for future users.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Installation
Because of the way WebStar V is installed on MacOS X, the installation process of Sawmill is a little tricky. Part of the Sawmill (for WebStar V) installation process involves command line work in the OS X UNIX "Terminal". This work is done, using the root user privileges and should be done with extreme caution since the root user has write access to all files, including system files. Because of security reasons, 4D instructs WebStar users only to install the WebStar application when logged in MacOS X as the admin user. 4D advises WebStar V users to create a MacOS X user that does NOT have admin privileges. The user is normally called "webstar" and is used when WebStar V is running. This setup protects system files and the WebStar V application against some types of unauthorized intrusions. At the same time there is full access to all website files that users upload and download. The above imposes a problem for Sawmill, since Sawmill must be installed in MacOS X when the user is logged in as the admin user. When Sawmill is installed and you log out and then log in as the "WebStar" user, to run WebStar V, you will not have access to run Sawmill, since Sawmill only will run when logged in as the MacOS X admin user. You have to change the ownership ID (or valid user name) of all Sawmill files from the admin user (typically the user that installed Mac OS X) to the "webstar" user, to be sure that Sawmill will run when logged in as the webstar user. To do this, follow these steps (again: be careful, as you will be logged in as the root user as long as you are working in the Terminal): 1. Webstar V and the "webstar" user must already be installed and configured. 2. If you haven't done it already, please be logged in as the admin MacOS X user and install Sawmill. The default installation process will create a "Sawmill" folder in the MacOS X "Applications" folder. All Sawmill files and folders will be installed in the Sawmill folder. Do not start Sawmill after the installation. 3. In the /Applications/Utilities folder, start the "Terminal" application. At the command prompt key, type the following commands and use your admin user password when asked (in this example the admin user is "davidw"). Also be sure that the Terminal gives you the same returns. If you experience any differences, you can always type "Exit" to log out from the root user and thereby (hopefully) protect against damage:
Comments
When you start the Terminal and choose "New" in the "Shell" menu, you will see this in a new terminal window. SUDO SU gives you root access. Type your admin password and hit the return key. Navigates you to the "sawmill" folder. Note that the commandline says you are now "root#".
Navigate back to the Applications folder. Now be careful: key the command exactly as shown. This changes the ownership of all files in the sawmill folder including the sawmill folder itself. Navigate into the sawmill folder Let's list the files and folders in the sawmill folder to check that the ownerships have really been changed. Note that the owner is now "webstar" instead of "davidw".
[localhost:/applications] root# cd sawmill [localhost:/applications/sawmill] root# ls -ls total 9104 0 drwxrwxr-x 3 webstar admin 58 Apr 21 09:43 Extras 24 -rwxr-xr-x 1 webstar admin 9348 Apr 21 09:43 License.txt 8 -rwxrwxr-x 1 webstar admin 3113 Apr 21 09:43 ReadMe.txt 0 drwxrwxr-x 3 webstar admin 58 Apr 21 09:43 Sawmill.app 0 drwxrwxr-x 6 webstar admin 160 Apr 21 09:43 Startup 9072 -rwxr-xr-x 1 webstar admin 4641768 Apr 21 09:43 sawmill [localhost:/applications/sawmill] root# exit exit [localhost:~] davidw%
4. Be sure that you have typed "exit" and you are now the admin user. Close Terminal.
Log out the root user by typing "exit" and hit the return key. Note that the admin user is returned.
5. Log out of Mac OS X (not shut down, only log out). 6. Log in as the "webstar" user. You can now use Sawmill at the same time as running Webstar V.
Log formats
When choosing log formats in WebStar V's "Admin Client" for a specific website, use "WebSTAR Log Format". Select all tokens and schedule WebStar V to archive log files once every day. In Sawmill's "Quick Start" you will be asked where the log files are located. WebStar V always creates log files for each defined website in the "logs" folder that is in the "WS_Admin" folder for each website. Sawmill will automatically choose "WebSTAR Log format" and ONLY that format, if, and only if, the log is written correctly. If you experience that Sawmill gives you other logfile formats to choose from (including the correct format), then there is an error. Sawmill will only choose the correct log format when the first line in the log file, in the first log file that you want to be processed by Sawmill, has the correct log file header. Solution If you experience this error do the following: 1. Open the first log file that is supposed to be processed by Sawmill, in your favorite text editor on a MacOS X machine (this is important, since WebStar V log files now have file names that can be longer than 32 characters). 2. Find the first instance of the following text and move your cursor just before the text (with your date and time): !!WebSTARSTARTUP 25/May/02 01:19:40 !!LOG_FORMAT BYTES_RECEIVED BYTES_SENT C-DNS..... 3. Mark all text from before the text and to the top of the file and delete the marked text. The above text should now be the very first text in the log. Save the file. Run "Quick Start" again.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Select "Import" and you will see, "Import Sawmill 7 data". Once you click on the import link, you will see this wizard:
From there you can browse to your Sawmill 7 LogAnalysisInfo Directory and select the entire directory. Once you select "Next" the Sawmill 7 directory you will see "Receiving data;please wait....". The Import Wizard is bringing in your data, and will update your newer version of Sawmill
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Profiles
This is the first menu item that is opened, and if you are launching Sawmill for the first time, you will create a new profile here. If you have created profiles, you will see your list, with options to View Reports using that profile, View Config, where you can edit the profile information, or Delete the profile.
Scheduler
Clicking this link will show the Scheduler (see Using the Sawmill Scheduler). You can create, delete, and edit scheduled tasks in this section. For instance, you can create a task to update all your databases every night, or to send a report of the previous month by email on the 1st of each month.
Users
Clicking this link will show the User Editor. In this page, you can add and remove users, and change the options for each user; e.g. you can specify which users have administrative access, and which profiles they are permitted to view.
Roles
Within the Roles section, you can create roles independent of users. Each role is assigned what they can or can't do in the areas of Admin, Reports and Config. You can set up different roles, depending on the types of users you will have. Some users can only access and view reports, other users will be able to edit, add or delete certain functions. This area gives you the greatest control over your users. Once you set up roles, then you can assign specific users to those roles.
Preferences
Clicking this link will show the Preferences editor. This lets you change global preferences, including server IP and port, language, charset, and more.
Tasks
In the Task area, you can see what tasks are being performed and also view the task log. This will let you know when the last build was performed and other activities that may have been scheduled throughout the day.
Licensing
Clicking this link will show the licensing page. In this page, you can add and remove licenses, or change licensing from Professional to Enterprise (if you're using a Trial license).
Import
The Import function is where you would import your older Sawmill data, if you have upgraded from an older version, this is where you should start.
My Account
The account information shown here is about your settings. This is also where the root adminstrator settings, with the username, password and language to be used in Sawmill are set.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Log Filters
Log Filters perform translations, conversions, or selective inclusion ("filter out") operations. For instance, a log filter could be used to reject (exclude) all log entries from a particular IP, or all log entries during a particular time. Log Filters could also be used to convert usernames to full names, or to simplify a field (for instance by chopping off the end of a URL, which is sometimes necessary to analyze a large proxy dataset efficiently). Log Filters are also used for some log formats (including web log formats) to differentiate "hits" from "page views" based on file extensions. For instance, GIF files are considered "hits" but not "page views", so the default filters set the "hit" field to 1 and the "page view" field to 0 for GIF hits. The same method can be used for any profile, to perform any kind of categorization (e.g. external vs. internal hits for a web log). Log Filters are written in Salang: The Sawmill Language, which provides full programming language flexibility, including the use of if/then/else clauses, and/or/not expressions, loops, and more.
Reports Editor
This editor lets you create and edit reports which appear in the statistics.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Reports
Reports present log file statistics in an attractive and easily navigable graphical format.
q q q q
Logo: The logo for Sawmill. Profile Name: The name of the active profile (the profile whose reports are displayed). View Config: A link to the configuration area that shows where your log source is coming from, log filters and database info and report options. Admin: A link to the administrative interface (profiles list, and other administrative functions). Logout: A link to log out of Sawmill, with the user name in brackets. Help: A link which opens a new window containing the Sawmill documentation. About: Shows you which version of Sawmill you are running, and if you are running the trail version, how many days you have left.
q q
Date Picker: Click this to open the Date Picker window, where you can select a range of days to use as the date/time filter. When you have selected a range in the Date Range, all reports will show only information from that time period, until the date/time filter is removed. Filters: Click this to open the Filters window, where you can create filters for any fields, in any combination. Filters created here dynamically affect all reports for a particular profile. Once you have set a Filter, all reports will show only information for that section of the data. Filters remain in effect until you remove them in the Filter window. Macros: You can create a new macro here or manage an existing one. Printer Friendly: Click on this icon and a separate page will launch, with only the report showing without the menus, ready to print. Miscellaneous: Click this to email the report, save changes, save as a new report, or to get the database info, where you can also update or rebuild the database.
Visitor Demographics: This section allows you to sort the data according to your visitor demographics, such as hostnames, domains, geographical areas, organizations, ISPs, and unique users. As in the other categories, you can customize these reports, choosing the number of columns you want to report on. q Visitor Systems: This section refers to the systems, as in hardware and monitors, that your visitors are using. You can see what web browsers they are using and also the operating systems. As in the previous sections, you can customize these reports as well, adding or eliminating the columns. q Single Page Summary: In the default view, the Single Page Summary shows the Overview Report and all of the other reports included in the Report Menu. The Single Page Summary is fully configurable, you can select Customize Report in Config, and choose which reports you want shown. Once you go into the config area, you can select or deselect the reports along the left menu. The middle section shows the current report elements, which can be added to or deleted from. q Log Detail: The Log Detail report is the most granular report, because it shows the date/time timestamp which is up to the minute for each log. In the default setting, it lists the hit type, page, hostname, referrer, server domain, etc. for each log. You can customize this report adding or deleting columns. But it will give you the most complete information of your log files.
q
The Report
The main portion of the window is occupied by the report itself. This is a view of the data selected by the filters that are preset or that you have selected. This provides one breakdown of the data specified by the filters -- you can select another report in the Reports Menu to break down the same data in a different way. There are several parts of the report:
Filters
There are many levels of filters when viewing reports:
q q q
Log Filters. These remain in effect until they are removed in the Log Filters page. Date/Time Filters. These remain in effect until they are removed in the Date Picker. Report Filters. These are options set in the Report Options, per report, and cannot be removed from the web interface. Report Element Filters. These are set per-report-element in the Customize Report in Config, and cannot be removed from the web interface.
All these filters are "anded" together; i.e. an item is in the total filter set if it is selected by the Filters AND by the Date/Time Filters AND by the Report Filters AND by the Report Element Filters. For instance, if the Date and Time Filters show hours of days events during 1am-2am, and the Days Filters show events on January 1, then the table will show event from January 1, during 1am-2am.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
In summary: a.) Customize Report Element gives quick access to some report element options within reports. b.) Customize Report Element is the only way to customize a report for Sawmill Lite users. q c.) In some reports, like Overview, Session Paths and Paths through a Page have no Customize Report Element option. q d.) A Root Administrator can define View/Access for Customize Report Element such as access to graph options and pivot tables. This means that non-root-admins may have no access or limited access to all tabs. q e.) In order to save changes, you must save under "Miscellaneous", "Save Report Changes".
q q
You can find the Customize Report Element function, by selecting a report and on the far right of the report you will see "Export Table" and "Customize". Once you select "Customize", then this dialogue box will open:
Graphs Sort by, determines how to sort the graphs. Show legend,check to show the graphs legend. If sort by is equal "All descending" or if no table is shown then you should always check this option. Chronological graphs have no legend because date/time values are Fields list with checkboxes. Check the fields for which a graph should be shown or click Select/Deselect All. Table It contains sort by, sort direction and the table columns in the fields list. There is only one checkbox for non-aggregating fields (text column) and three checkboxes per aggregation field which can be read as "show numerical column", "show percent column" and "show table bar graph column". Table Options Set the number of rows, and the default setting means the number of rows after a login, or if a filter is applied. Pivot Table Select the box for showing the table it the number of rows and how the fields will be sorted Graph Options Select bar, line or pie chart, with the number or rows and variables, including the screen size of the graph.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
This is where you would define Roles for your other users. A role defines the permissions to view/access features and to edit/add/delete objects. A role does not define access to specific profiles, it only specifies which features can be assessed and the permissions a user has per feature (edit/add/delete). Sawmill provides two default roles after installation, Manager and Statistics Visitor. Both roles can be named whatever you wish and with their appropriate features set. Users are defined by username and password. Select "Users" in the menu, next to "Roles and you will see:
Select a username, and password and then define the profile and roles associated with that name. These are assigned in pairs, with one or more profiles and one or more roles. You can assign any access role pairs, there is no limit. For instance, you can have a user that can access profile A and profile B, that uses Role A. That user could also have access to profile C with Role B. There is no limit in the access of pair combinations, the same profile could be part of several access pairs. RBAC allows you to hide specific report values in table columns or graphs by setting grants for specific field categories, such as IP address, hostname, user, etc. All field categories are disabled by default, so they are not visible in roles and they are not functional. The reason they are disabled is that there are field categories such as date/ time or day of week which are unlikely to be granted ever, or a Root Administrator may not use grants for field categories at all. You can enable/disable field categories by opening up field_categories.cfg, within the LogAnalysisInfo directory. With a text editor, change the "active_in_rbac" value to true or false. Once the field category is enabled, it will be visible in roles within the reports tab and it will hide any report value of that specific field catagory unless the field category view/access permission is checked within the role.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Users
The first time you run Sawmill through a web browser, it will prompt you to choose a root administrator username and password. This information, which is stored in the users.cfg file in the LogAnalysisInfo folder, specifies the first user. After that, you can create additional users by clicking on the Users menu and by selecting New User. Within the new user menu, you name the user, add a password and select the language. You can also select which profile and role a particular user should have. The root administrator must be logged in to perform any tasks where security is an issue (for instance, when specifying which log data to process, or when creating a profile). If you forget the root administrator username or password, you can reset it by removing the users.cfg file, found in the LogAnalysisInfo folder. For more information on security in Sawmill, see Security.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
lang_stats: The statistics (reports). This includes the names of the reports, the names of the fields, the instructions, the names of the options, the table headings, and more. This small language module is all that must be translated to translate Sawmill's statistics into any language. lang_admin: The administrative interface. This includes the text of the profiles list, the configuration page, the Scheduler, and other administrative pages. lang_options: The names and descriptions of all of Sawmill's options. lang_messages: The error messages and debugging output.
If all you need is a translation of the statistics (i.e. if you can use the English administrative interface, but need to provide statistics in the native language of your clients), then you only need a translation of the lang_stats module. If you need to be able to use the profile list, configuration page, Scheduler, etc. in a language other than English, you will need translations of the lang_admin, lang_options, and lang_messages modules.
information about the language modules it's reading, included the tokens it's processing and the name/value pairs it's finding. It is usually a simple matter to examine the output and see where the erroneous quote or bracket is.
q q
q q
RUNNING_USERNAME: The name of user Sawmill is running as. PRODUCT_NAME: The name of Sawmill ("Sawmill" by default, but different if the product has been relabelled or whitelabelled). PRODUCT_EXECUTABLE: The name of the Sawmill executable file ("Sawmill.exe" in this case). PRODUCT_EXECUTABLE_DOCS: The name of the Sawmill executable file, as used in documentation where a generic executable name is more useful than the actual one ("sawmill" in this case). PRODUCT_URL: A link to the Sawmill home page ("http://www.sawmill.net/" in this case). COPYRIGHT_HOLDER: The holder of the Sawmill copyright ("Flowerfire, Inc." in this case).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Log Importer: A component which reads log data from a log source, and puts it into the Sawmill database. Sawmill Database: A database which keeps a copy of the log data, and is queried to generate reports. Reporting Interface: A web interface providing access to dynamic reports. Administrative Interface: A web interface for administrating profiles, schedules, users, roles, tasks, and more. Web Server: A built-in web server providing a graphical interface for administrating and viewing reports. Command-line Interface: An extensive command-line interface that can be used to manage profiles, generate reports, and more.
Log Importer
Sawmill is a log analyzer. It reads text log data from a log source (usually log files on the local disk, a network mounted disk, or FTP), parses the data, and puts it in the Sawmill Database. Log Filters can be used to convert or filter the data as it is read, or to pull in external metadata for use in reports.
Sawmill Database
The Sawmill Database stores the data from the log source. The Log Importer feeds data into the database, and the Reporting Interface queries the database to generate reports. The database can be either the internal database, built into Sawmill, or a MySQL database.
Reporting Interface
The Reporting Interface is an HTML interface delivered through the Web Server, to any compatible web browser. Reports are generated dynamically by querying the Sawmill Database. The Reporting Interface allows arbitrary filters to be applied to any report, including Boolean filters. Reports can be created or edited through the Administrative Interface.
Administrative Interface
The Administrative Interface is an HTML interface delivered through the Web Server, to any compatible web browser. The Administrative Interface is used to create and edit profiles, users, scheduled tasks, and more.
Web Server
The Web Server is an HTTP server built into Sawmill. It serves the Reporting Interface and the Administrative Interface. It is also possible to use an external web server like Apache or IIS to serve the Sawmill interface.
Create profiles Build or update databases (process logs) Expire data from an existing database Email HTML reports Generate HTML reports to disk
The most common command-line actions can also be run periodically, using the Sawmill Scheduler.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Server Sizing
In a large installation of Sawmill, it is typically best to have a dedicated Sawmill server. This document helps you decide the "Best Practices" type and specifications of the server.
Disk
You will need between 100% and 200% the size of your uncompressed log data to store the Sawmill database. For instance, if you want to report on 1 terabyte (TB) of log data in a single profile, you would need up to 2 TB of disk space for the database. This is the total data in the database, not the daily data added; if you have 1 TB of log data per day, and want to track 30 days, then that is a 30 TB dataset, and requires between 30 TB and 60 TB of disk space. If you are using a separate SQL database server, this is the space used by the database server; it is the databases which use this disk space, and the remainder of Sawmill will fit in a small space, and 1 GB should be sufficient. Sawmill uses the disk intensively during database building and report generation, so for best performance, use a fast disk. Ideally, use a RAID 10 array of fast disks. RAID 5 or RAID 6 will hurt performance significantly (about 2x slower than RAID 10 for database builds), and is not recommended. Write buffering on the RAID controller should be turned on if possible, as it provides an additional 2x performance for database builds.
Memory
We recommend 8GB of RAM for large datasets, on the Sawmill server. If you are using a separate database server, we recommend 2 GB-4 GB on the database server.
Processor(s)
To estimate the amount of processing power you need, start with the assumption that Sawmill processes 2000 log lines per second, per processor for Intel or AMD processors; or 1000 lines per second for SPARC or other processors. Note: This is a conservative assumption; Sawmill can be much faster than this on some datasets, reaching speeds of 10,00020,000 lines per second in some cases. However, it is best to use a conservative estimate for sizing your processor, to ensure that the specified system is sufficient. Compute the number of lines in your daily dataset, 200 bytes per line is a good estimate, and that will tell you how many seconds Sawmill will require to build the database. Convert that to hours, if it is more than six hours, you will need more than one processor. Roughly speaking, you should have enough processors that when you divide the number of hours by the number of processors, it is less than 6. For example: > * 50 GB of uncompressed log data per day > * divide by 200 -> ~268 million lines of log data > * divide by 2000 -> ~134 thousand seconds > * divide by 60*60 -> ~37 hours > * divide by 6 -> 6 processors
q q q q q
50 Gigabytes (GB) of uncompressed log data per day divide by 200 -> ~268 million lines of log data divide by 2000 -> ~134 million seconds divide by 3600 -> ~37 hours divide by 6 -> 6 processors
The six hour number is based on the assumption that you don't want to spend more than six hours per night updating your database to add the latest data. A six hour nightly build time is a good starting point; it provides some flexibility to modify and tune the database and filters in ways which slow down processing, without exceeding the processing time available each day. The dataset above could be processed in 9 hours on four processors, if an 9 hour nightly build time is acceptable.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Distributed Parsing
Sawmill can improve the performance of log processing, database builds or updates by distributing log data in chunks, to parsing servers, which are special instances of Sawmill providing parsing services. Parsing servers listen on a particular IP and port for a parsing request, receive log data on that socket, and deliver normalized data back to the main process on that socket. Basically, this means that a "-a bd" task can farm out parsing to multiple other processes, potentially on other computers, using TCP/IP. When used with local parsing servers, this can improve performance on systems with multiple processors. The distributed parsing options can be changed through the web interface in Config -> Log Processing -> Distributed Processing. The section below describes how they are arranged in the internal CFG file for the profile; the meanings of the options in the web interface are analogous.
The log.processing.distributed node in profile .cfg files controls the distribution of log parsing to parsing servers. The simplest value of log.processing.distributed is: distributed = { method = "one_processor" } In this case, Sawmill never distributes parsing; it does all parsing in the main -a bd task. The default value of log.processing.distributed is: distributed = { method = "all_processors" starting_port_auto = "9987" } In this case, on a 1-processor system, Sawmill does not distribute processing, this acts as "one_processor". On an Nprocessor system, Sawmill spawns N+1 local parsing servers, and distributes processing to them, and terminates them when it is done building the database. The first server listens on the server specified by starting_port_auto; the next one listens on starting_port_auto+1, etc. If only some processors are to be used for distributed processing, it looks like this: distributed = { method = "some_processors" number_of_servers = "4" starting_port_auto = "9987" } This is similar to "all_processors", but uses only the number of processors specified in the number_of_servers parameter. The final case is this: distributed = { method = "listed_servers" servers = { 0 = { spawn = true
hostname = localhost port = 8000 } 1 = { spawn = true hostname = localhost port = 8001 } 2 = { spawn = false hostname = wisteria port = 8000 } 3 = { spawn = false hostname = wisteria port = 8001 } } # servers } # distributed In this case, the parsing servers are explicitly listed in the profile. Sawmill spawns those where spawn=true (which much be local), and also shuts those down at completion. Those where spawn=false must be started explicitly with "-p {profilename} -a sps -psh {ip} -psp {port}". In this example, two of them are local (servers 0 and 1), and two are remote (on a machine called "wisteria"). This final case can be used to distribute log processing across a farm of servers.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Pathnames
A pathname is a value which fully describes the location of a file or folder on your computer. Pathnames are used in Sawmill to describe the locations of the log data, the server folder, the Database folder, and other files and folders. The leftmost part of a pathname generally describes which hard drive or partition the file or folder is on, and as you move from left to right along the pathname, each successive part narrows the location further by providing the name of an additional subfolder. It is not generally necessary to type pathnames if you are using the Sawmill graphical web browser interface; the Browse button next to each pathname field provides a friendlier way to specify a pathname, using a familiar folder browsing mechanism. This button is available everywhere except when choosing the server folder in CGI mode, where you must enter the pathname manually. Pathnames use different formats on different platforms. On Windows, the format is driveletter:\folder1\folder2 ... foldern\filename for files, and the same for folders, except that the final filename is omitted (but not the final \). For instance, a file my.conf inside the folder configs, which is inside the folder sawmill, which is inside the folder web on the C: drive, is represented by C:\web\sawmill\configs\my.conf . The folder containing my.conf is represented by C:\web\sawmill\configs \. On MacOS X, Linux, or other UNIX-type systems, the format is /folder1/folder2 ... foldern/filename for files, and the same for folders, except that the final filename is omitted (but not the final /). For instance, a file my.conf inside the folder configs, which is inside the folder sawmill, which is inside the folder web (a subfolder of the root / folder), is represented by /web/sawmill/configs/my.conf . The folder containing my.conf is represented by /web/sawmill/ configs/ .
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
This is hierarchical; some pages are above others because they are within them. For instance, a folder is above all of its subfolders and all of the files in it. This hierarchical structure allows you to "zoom" into your log data one level at a time. Every report you see in a web log analysis corresponds to one of the items above. Initially, Sawmill shows the page corresponding to /, which is the top-level folder (the "root" of the hierarchical "tree"). This page will show you all subpages of /; you will see /dir1/ and /dir2/. Clicking on either of those items ("subitems" of the current page) will show you a new page corresponding to the subitem you clicked. For instance, clicking on /dir2/ will show you a new page corresponding to /dir2/; it will contain /dir2/dir3/, the single subitem of / dir2/. Clicking on /dir2/dir3/ will show a page corresponding to /dir2/dir3/, containing the subitems of /dir2/dir3/: /dir2/dir3/file3. html and /dir2/dir3/file4.html. Sawmill shows the number of page views (and/or bytes transferred, and/or visitors) for each item it displays. For instance, next to /dir2/ on a statistics page, you might see 1500, indicating that there were 1500 page views on /dir2/ or its subitems. That is, the sum of the number of page accesses on /dir2/, /dir2/dir3/, /dir2/dir3/file4.html, /dir2/dir3/file3.html is 1500. That could be caused by 1500 page views on /dir2/dir3/file4.html and no page views anywhere else, or by 1000 page views on / dir2/dir3/file3.html and 500 page views directly on /dir2/, or some other combination. To see exactly which pages were hits to create those 1500 page views, you can zoom in by clicking /dir2/. There are many other hierarchies besides the page hierarchy described above. For instance, there is the date/time hierarchy, which might look like this:
The date/time hierarchy continues downward similarly, with days as subitems of months, hours of days, minutes of hours, and seconds of minutes. Other hierarchies include the URL hierarchy (similar to the page hierarchy, with http:// below the root, http://www.flowerfire.com/ below http://, etc.), the hostname hierarchy (.com below the root, flowerfire.com below .com, www. flowerfire.com below flowerfire.com, etc.), and the location hierarchy (countries/regions/cities). Some terminology: the top very top of a hierarchy is called the "root" (e.g. "(the root)" in the date/time hierarchy above, or "/" in the page hierarchy). An item below another item in the hierarchy is called a "subitem" of that item (e.g. 2004 is a subitem of the root, and /dir2/dir3/file4.html is a subitem of /dir2/dir3/). Any item at the very bottom of a hierarchy (an item with no subitems) is called a "leaf" or a "bottom level item" (e.g. Jan/2003 and /dir1/file1.html are leaves in the above hierarchies). An item which has subitems, but is not the root, is called an "interior node" (e.g. 2004 and /dir2/dir3/ are interior nodes in the above hierarchies).
To save database size, processing time, and browsing time, it is often useful to "prune" the hierarchies. This is done using the Suppress Levels Below and Suppress Levels Above parameters of the database field options. Levels are numbered from 0 (the root) downward; for instance 2003 is at level 1 above, and /dir2/dir3/file4.html is at level 4. Sawmill omits all items from the database hierarchy whose level number is greater than the Suppress value. For instance, with a Suppress value of 1 for the date/time hierarchy above, Sawmill would omit all items at levels 2 and below, resulting in this simplified hierarchy in the database:
(the root) / \\ 2003 2004 Using this hierarchy instead of the original saves space and time, but makes it impossible to get date/time information at the month level; you won't be able to click on 2003 to get month information. Sawmill also omits all items from the hierarchy whose level number is less than or equal to the Collapse value (except the root, which is always present). For instance, with a Collapse value of 1 (and Suppress of 2), Sawmill would omit all level 1 items, resulting in this hierarchy:
/ Nov/2003
\\ Feb/2004
All four of the level 2 items are now direct subitems of the root, so the statistics page for this hierarchy will show all four months. This is useful not just because it saves time and space, but also because it combines information on a single page that otherwise would have taken several clicks to access. Here's an example of "Suppress Levels Above" and "Suppress Levels Below," based on the page field value /dir1/dir2/dir3/ page.html. With above=0 and below=999999, this will be marked as a single hit on the following items: level 0: / level 1: /dir1/ level 2: /dir1/dir2/ level 3: /dir1/dir2/dir3/ level 4: /dir1/dir2/dir3/page.html With above=3, below=999999, all levels above 2 are omitted (the root level, 0, is always included): level 0: / level 2: /dir1/dir2/ level 4: /dir1/dir2/dir3/page.html With above=0, below=2, all levels below 2 are omitted: level 0: / level 1: /dir1/ level 2: /dir1/dir2/ above=2, below=3: all levels above 2 are omitted (except 0), and all levels below 3 are omitted: level 0: / level 2: /dir1/dir2/ level 3: /dir1/dir2/dir3/ In the last example, zooming in on / will show you /dir1/dir2/ (you will never see /dir1/ in the statistics, because level 1 has been omitted); zooming in on that will show you /dir1/dir2/dir3/, and you will not be able to zoom any further, because level 4 has been omitted. On a side note, the "show only bottom level items" option in the Table Options menu provides a dynamic way of doing roughly the same thing as using a high value of Collapse. Using the Options menu to show only bottom level items will dynamically
restructure the hierarchy to omit all interior nodes. In the case of the page hierarchy above, that would result in the following hierarchy:
/ /dir1/file1.html
\\ /dir2/dir3/file4.html
This hierarchy has all leaf items as direct subitems of the root, so the statistics page for this hierarchy will show all four pages. This is much faster than using Suppress because after Suppress has been modified, the entire database must be rebuilt to reflect the change. The Database Structure Options provide a couple other ways of pruning your hierarchies. The "Always include bottom-level items" option, forces Sawmill to include the bottom-level items in the hierarchy, regardless of the setting of the Suppress value. It is useful to include the bottom-level items if you need them for some feature of Sawmill (for instance, visitor information requires that all the bottom-level hosts be present, and session information is most meaningful if all the bottomlevel date/times are present), but you want to prune some of the interior of the hierarchy to save space. Hierarchies can be either left-to-right (with the left part of the item "enclosing" the right part, as in /dir1/dir2/file.html, where / dir1/ encloses /dir1/dir2/, which encloses /dir1/dir2/file.html), or right-to-left (with the right part of the item enclosing the left part, as in www.flowerfire.com, where .com encloses flowerfire.com, which encloses www.flowerfire.com). Hierarchies use one or more special characters to divide levels (e.g. / in the page hierarchy or . in the host hierarchy). These options and more can be set in the log field options. Some log fields are not hierarchical, for instance the operation field of a web log (which can be GET, POST, or others), or integer fields like the size or response fields. These fields are specified in the log field options as non-hierarchical, or "flat", and all items in those hierarchies appear directly below the root.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Regular Expressions
Regular expressions are a powerful method for defining a class of strings (strings are sequences of characters; for instance, a filename is a string, and so is a log entry). Sawmill uses regular expressions in many places, including:
q
You can specify the log files to process using a regular expression. You can specify the log file format using Log data format regular expression. You can filter log entries based on a regular expression using log filters.
You can also use wildcard expressions in these cases, using * to match any string of characters, or ? to match any single character (for instance, *.gif to match anything ending with .gif, or Jan/??/2000 to match any date in January, 2000). Wildcard expressions are easier to use than regular expressions, but are not nearly as powerful. Regular expressions can be extremely complex, and it is beyond the scope of this manual to describe them in full detail. In brief, a regular expression is a pattern, which is essentially the string to match, plus special characters which match classes of string, plus operators to combine them. Here are the simplest rules:
q
A letter or digit matches itself (most other characters do as well). The . character (a period) matches any character. The * character matches zero or more repetitions of the expression before it. The + character matches one or more repetitions of the expression before it. The ^ character matches the beginning of the string. The $ character matches the ending of the string. A square-bracketed series of characters matches any of those characters. Adding a ^ after the opening bracket matches any character except those in the brackets. Two regular expressions in a row match any combination where the first half matches the first expression, and the second half matches the second expression. The \ character followed by any other character matches that character. For example, \* matches the * character. A regular expression matches if it matches any part of the string; i.e. unless you explicitly include ^ and/or $, the regular expression will match if it matches something in the middle of the string. For example, "access\.log" matches not only access.log but also old_access.log and access.logs. A parenthesized regular expression matches the same thing as it does without parentheses, but is considered a single expression (for instance by a trailing *). Parentheses can be used to group consecutive expressions into a single expression. Each field should be parenthesized when using Log data format regular expression; that's how Sawmill knows where each field is. An expression of the form (A|B) matches either expression A or expression B. There can also be more than two expressions in the list.
The list goes on, but is too large to include here in complete form. See the Yahoo link above. Some examples:
q
a matches any value containing the letter a. ac matches any value containing the letter a followed by the letter c. word matches any value containing the sequence "word". worda* matches any value containing the sequence "word" followed by zero or more a's. (word)*a matches any value containing zero or more consecutive repetitions of "word", where the last repetition followed by an a. \.log$ matches any value ending with .log (good for matching all files in a directory ending with .log). ^ex.*\.log$ matches any value starting with ex and ending with .log. ^access_log.*1 matches any value starting with "access_log", and containing a 1 somewhere after the leading "access_log" (note that the 1 does not have to be at the end of the string for this to match; if you want to require that the 1 be at the end, add a $ to the end of the expression). ^access_log_jan....2004$ matches any value starting with "access_log_jan", followed by four characters (any four characters), followed by "2004", followed immediately by the end of the value.
As you can see, regular expressions are extremely powerful; a pattern can be devised to match almost any conceivable need.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Security
Since Sawmill runs as a CGI program or as a web browser, it publishes its interface to any web browser which can reach its server. This is a powerful feature, but also introduces security issues. Sawmill has a number of features which address these issues: 1. Non-administrative users can access Sawmill through the profilelist (same as administrative users). When a nonadministrator is logged in, the profile list only allows users to view reports of profiles; users cannot create, edit, or delete profiles, and they cannot build, update, or modify the database of any profile. The profile list is available at: http://www.myhost.com:8988/ in web server mode, or http://www.myhost.com/cgi-bin/sawmill in CGI mode. 2. If you wish to take it a step further, and not even present the profiles list to users, you can refer users to the reports for a particular profile: http://www.myhost.com/cgi-bin/sawmill.cgi?dp=reports&p=profile&lun=user&lpw=password replacing profile with the name of the profile, user with the username, and password with the password (this should all be one one line). Accessing this URL will show the reports for specified profile, after logging in as the specified user using the specified password. 3. Only authorized administrators (users who know the username and password of a Sawmill administrator, chosen at install time) may create new profiles, and only authorized administrators may modify profiles. Without administrator access, a user cannot create a new profile, modify an existing profile in any way, or perform any other the other tasks available on the administrative interface. Sawmill also provides detailed control over the file and folder permissions of the files and folders it creates; see File/Folder Permissions.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
File/Folder Permissions
Sawmill lets you control very precisely the permissions on the files and folders it creates. This is particularly useful on multiuser environments, where file permissions are a fundamental part of file system security. You can set permissions independently for a number of categories of files and folders Sawmill creates. For each category, you can specify the value of the permissions number. The permissions number is a three-digit octal (base8) number, as accepted by the UNIX chmod command. The first digit controls the permissions for the user running Sawmill, the second digit controls the permissions for the group of the user running Sawmill, and the third digit controls the permissions for all other users. Each digit is the sum of: 4, if read permission should be granted; 2, for write permissions; and 1, for execute or folder search permission. These values can be added in any combination to provide any combination of permissions. For example, to grant only read permission, use a 4. To grant both read and write permission, use the sum of 4 and 2, or 6. To grant read and execute permission, use the sum of 4 and 1, or 5. You should give execute permissions for folders if you want users to be able to view their contents. A complete example of a permissions option value: 754 gives the Sawmill user read, write, and execute permission, gives the group read and execute permission, and gives all other users read permission. See Security for more information on Sawmill's security features.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
You may not be interested in seeing the hits on files of a particular type (e.g. image files, in web logs). You may not be interested in seeing the events from a particular host or domain (e.g. web log hits from your own domain, or email from your own domain for mail logs). For web logs, you may not be interested in seeing hits which did not result in separate page views, like 404 errors (file not found) or redirects.
Sawmill's default filters automatically perform the most common filtering (they categorize image files as hits but not page views, strip off page parameters, and more) but you will probably end up adding or removing filters as you fine-tune your statistics.
` Now add a filter (to log.filters in the profile .cfg file) to compute DST ranges, and subtract when necessary: log.filter.dst_adjust = ` # Make sure v.dst_info exists; that's where we'll store the start and end dates of DST for each year v.dst_info = ""; # Extract the year from the date/time int year; if (matches_regular_expression(date_time, "^([0-9]+)/([A-Za-z]+)/([0-9]+) ")) then year = 3; # Get the subnode for this year, with DST range for this year node dst_info_year = subnode_by_name('v.dst_info', year); # If we haven't computed the start and end of DST for this year, do it now if (!subnode_exists(dst_info_year, 'start_date_time_epoc')) then ( # Iterate through the year, starting on Jan 1, looking for the daylight savings range. # This could start on Mar 7 more efficiently, if we assume DST starts on the first Sunday in March, # but this way it can support any daylight savings range. # Iterate until we've found the endpoints, or until we've gone 365 iterations (no infinite loops). int first_second_of_this_day_epoc = date_time_to_epoc('01/Jan/' . year . ' 00:00:00'); int second_sunday_of_march = 0; int first_sunday_of_november = 0; int sundays_found_in_march = 0; int sundays_found_in_november = 0; int iterations = 0; while (((second_sunday_of_march == 0) or (first_sunday_of_november == 0)) and ! (iterations > 365)) ( # Compute the first second of this day in date/time format (dd/mmm/yyyy hh:mm:ss) string first_second_of_this_day_date_time = epoc_to_date_time (first_second_of_this_day_epoc); # Break it into day, month, and year if (matches_regular_expression(first_second_of_this_day_date_time, "^([0-9]+)/([A-Za-z] +)/([0-9]+) ")) then ( int day = 1; string monthname = 2; int monthnum = get_month_as_number(first_second_of_this_day_date_time); int year = 3; # Compute the day of week string weekday = get_weekday(year, monthnum, day); # If this is the second Sunday in March, it's the starting point of DST. if ((monthname eq "Mar") and (weekday eq "Su")) then ( sundays_found_in_march++; if (sundays_found_in_march == 2) then second_sunday_of_march = first_second_of_this_day_epoc; ); # If this is the first Sunday in November, it's the ending point of DST if ((monthname eq "Nov") and (weekday eq "Su")) then ( sundays_found_in_november++;
if (sundays_found_in_november == 1) then first_sunday_of_november = first_second_of_this_day_epoc; ); ); # if valid date_time # Go to the next day first_second_of_this_day_epoc += 24*60*60; iterations++; ); # while haven't found start and end # Compute the first and last second of the DST range, in date_time format string first_second_of_second_sunday_of_march = epoc_to_date_time(second_sunday_of_march); string last_second_of_first_sunday_of_november = substr(epoc_to_date_time (first_sunday_of_november), 0, 11) . ' 23:59:59'; # Remember the endpoints of the range in the node fo rthis year set_subnode_value(dst_info_year, 'start_date_time_epoc', date_time_to_epoc (first_second_of_second_sunday_of_march)); set_subnode_value(dst_info_year, 'end_date_time_epoc', date_time_to_epoc (last_second_of_first_sunday_of_november)); ); # if this year not computed # Get the endpoints of the range from this year's subnode int start_date_time_epoc = node_value(subnode_by_name(dst_info_year, 'start_date_time_epoc')); int end_date_time_epoc = node_value(subnode_by_name(dst_info_year, 'end_date_time_epoc')); # If this date is within the DST range, subtract an hour if ((date_time_to_epoc(date_time) >= start_date_time_epoc) and (date_time_to_epoc (date_time) <= end_date_time_epoc)) then date_time = epoc_to_date_time(date_time_to_epoc(date_time) - 60*60); ` # <- closing quote of log.filter.dst_adjust This filter is not optimally fast--it would be better to move the variable declarations to filter_initialization, since they take some time. But it should be reasonably fast written this way (most of the code runs only once per log data year), and it is easier to read.
This filter uses subnode_exists() to check if there is an entry in usernames_to_full_names for the current username (the value of the "username" log field; this assumes that the field is called "username"). If there is, it uses subnode_by_name() to get the node corresponding to that user, in usernames_to_full_names, and then uses node_value() to get the value of the node (the full name). It assigns that value to the username field, overwriting it with the full name.
if (!node_exists("previously_seen_ips")) then previously_seen_ips = ""; if (!subnode_exists("previously_seen_ips", v.c_ip)) then ( new_sessions = 1; set_subnode_value("previously_seen_ips", v.c_ip, true); ); The first line uses replace_all(), to compute a variable v.c_ip from the c_ip field, by replacing dots (.) with underbars (_). This is necessary because configuration nodes use dots as the separator, so dots cannot be used in configuration node names; we're about to add nodes with names like v.previously_seen_ips.{c_ip}, so we need to make sure c_ip does not contains dots. The next two line uses node_exists() to check to see if the node v.previous_seen_ips exists; if it doesn't, it creates it (assigning a value to an undefined node defines it). Without this step, the next line, which checks for subnodes of v. previously_seen_ips, would fail with an error that v.previously_seen_ips does not exist. The next part uses subnode_exists() to check if there is a subnode of v.previously_seen_ips which matches the c_ip field (with dots converted to underbars). For instance, if the log data contains 12.34.56.78 as the IP, this will check for the existence of a node v.previously_seen_ips.12_34_56_78. If this node exists, then we know that this IP has appeared previously, and this is not a new session. If this node does not exist, then we create it using set_subnode_value(), and then set new_sessions to 1, so this event will appear as a new session event in the reports. The code above uses an in-memory node for previously_seen_ips, so it won't remember which IPs it has seen for the next database update; that's fine as long as you rebuild, but if you want to update you'll want to save the node to disk, and reload it for each update. This can be done using the log.filter_finalization node in the profile (or plug-in), which specifies code to run at the end of log processing. The following example saves previously_seen_ips to disk: log.filter_finalization = `save_node('previously_seen_ips')` The node will be saved to the file previously_seen_ips.cfg, in the LogAnalysisInfo folder . Since nodes are automatically loaded from disk when they are accessed, future updates will automatically load this node into memory when the log filter first accesses it, so there is no need for a "load_node()" operation.
The initialization code makes sure the spider_rejection.accessed_js_css_file and spider_rejection.accessed_robots_txt nodes exist. Note: you will need to create a directory called spider_rejection, in the LogAnalysisInfo directory, to hold these nodes. The finalization step saves these nodes, to LogAnalysisInfo/spider_rejection/accessed_js_css_file.cfg and LogAnalysisInfo/ spider_rejection/accessed_robots_txt.cfg. Now add a filter (to log.filters in the profile .cfg file) to detect CSS/JS hits, and to reject spiders. reject_spiders = ` # Convert . to _ in the IP v.converted_ip = replace_all(c_ip, '.', '_'); # If this is a JS file, remember that this IP accessed it. if ((file_type eq 'JS') or (file_type eq 'CSS')) then ( set_node_value(subnode_by_name('spider_rejection.accessed_js_css_file', v.converted_ip), true); ); # If this is /robots.txt, remember that this IP accessed it. if (cs_uri_stem eq '/robots.txt') then ( set_node_value(subnode_by_name('spider_rejection.accessed_robots_txt', v.converted_ip), true); ); # Reject as spiders any hit which did not access JS/CSS files, or did access /robots.txt if (subnode_exists('spider_rejection.accessed_robots_txt', v.converted_ip) or !subnode_exists('spider_rejection.accessed_js_css_file', v.converted_ip)) then ( 'reject'; ); ` This filter does the following: 1. It creates and sets a variable v.converted_ip, to the IP address (which is called c_ip in this IIS profile; it may be called something else for other log formats), with dots converted to underbars. This is necessary because node names cannot contain dots. 2. It checks if the hit is a JS or CSS hit; if it is, it sets spider_rejection.accessed_js_css_file.ip to true, where ip is the converted IP address. 3. It checks if the hit is on /robots.txt (again using IIS field names; this would be called "page" for Apache); if it is, it sets spider_rejection.accessed_robots_txt.ip to true, where ip is the converted IP address. 4. It checks if the current IP address is in the accessed_robots_txt list, or is not in the accessed_js_css_file list; if so, it rejects the hit. The first time a dataset is processed with these log filters, this will cause the accessed_robots_txt and accessed_js_css_file lists to be populated. The data then must be processed a second time, since it is not possible to know the first time an IP address is encountered, whether that IP will ever hit a JS file. The second time it is processed, all data has been processed once, and the lists are complete, so the spider rejection will work properly. So the database needs to be built twice to remove all spiders.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The syntax is case sensitive. Words (month names and "quarter") must be written in English, regardless of the localized language. Adjacent date/time units, which are a number, must be separated by a divider, otherwise the divider is optional. A divider is any non-alphanumeric character except dash "-" and commma ",". The dash is reserved for date ranges. The comma is reserved for multiple dates. Mixing absolute date filters with relative date filters is not allowed.
Single dates
A single date is composed of the following date/time units: Date/time unit Unit values year quarter month day hour minute second 2 digit or 4 digit year q1-q4 or quarter1-quarter4 jan-dec or january-december 1-31 or 01-31 00-23 00-59 00-59
Date/time unit sequence year quarter year month year day month year day month year hour day month year hour minute
Date Ranges
Date ranges are the combination of two arbitrary single dates or one single date with the "start" or "end" keyword separated by a dash "-". The dash "-" dividing the single dates may be surrounded by two other dividers such as spaces or any other nonalphanumeric character, except dash "-" and comma ",". Date range sequences start - (single date) (single date) - end df example start - 5/jan/2009 feb/2009 - end
Note: Start specifies the earliest log date, end specifies the latest log date.
Multiple dates
Multiple dates are an arbitrary sequence of single dates separated by a comma ",". Multiple dates are in particular required when a user zooms to multiple date items. The comma "," dividing multiple dates may be surrounded by two other dividers such as spaces or any other nonalphanumeric character, except dash "-" and comma ",". Multiple dates sequence df example
(single date), (single date), (single date) 18jul2009, 20jul2009, 17dec2009 Multiple dates with different date/time units are required when zooming on a Pivot Table with different date/time units, i.e.: 10feb2007, jul2007, 4aug2007, 2008 Multiple dates must not overlap, so "2jul2007, 2jul2007, jul2007" will not return a reliable result.
Examples
This specifies the date filters to use when generating a report. It is similar to the Report Filters option, but with a few differences:
q
It is intended only for date filtering (use Report Filters for other types of filtering). It supports a wider range of syntax options. It is displayed attractively in the \"Date Filter\" section of the report (filter specified with Report Filters will appear below that, and will appear as literal filter expressions).
df 2/jan/2009 2/jan/2009 14 2/jan/2009 14:30 2/jan/2009 14:30:00 jan2009 quarter1 2009 jan2008-jan2009 2jan2008-2009 start-2/JAN/09 2/jan/2009-end 08 - 28 feb 2009 14
Calculated date applied in the report 02/Jan/2009 02/Jan/2009 14:00:00 - 02/Jan/2009 14:59:59 02/Jan/2009 14:30:00 - 02/Jan/2009 14:30:59 02/Jan/2009 14:30:00 01/Jan/2009 - 31/Jan/2009 01/Jan/2009 - 31/Mar/2009 01/Jan/2008 - 31/Jan/2009 02/Jan/2008 - 31/Dec/2009 earliest log date - 02/Jan/2009 02/Jan/2009 - latest log date 01/Jan/2008 - 28/Feb/2009 14:59:59 14/Dec/2008, 01/Jan/2009, 05/Jan/2009 (Comma separated dates are in particular used internally when zooming date items in a report element)
* The "first" prefix is reserved in case that we require a relative date filter which calculates dates from the earliest log date. Date/time units may be written as single letters (except time units), as singular or plural word, in lowercase and/or uppercase.
q q q q q q q q
y = year = years q = quarter = quarters m = month = months w = week = weeks d = day = days hour = hours minute = minutes second = seconds
There are three different main types of relative date filters which are described by a "Show 3 months" example: df Description
3M
Shows the most recent 3 months relative to the latest log date. If we assume a log date range from 1/Jan/2006 until 10/Jul/2007 then Sawmill will calculate a date which shows May/2007, Jun/2007 and Jul/2007. This type of date filter doesn't consider the current calendar date!
Shows the most recent 3 months relative to the currrent calendar date. If we assume a current calendar date recent3M of 5/Jan/2008 then Sawmill will calculate a date which shows Nov/2007, Dec/2007 and Jan/2008.recent includes the current calendar month<.b>. last3M Shows the last 3 months relative to the currrent calendar date. If we assume a current calendar date of 5/ Jan/2008 then Sawmill will calculate a date which shows Oct/2007, Nov/2007 and Dec/2007. last excludes the current calendar month.
Note: The above main type example is valid for all date/time units such as Y (years), Q (quarters), M (months), W (weeks), D (days), hours, m minutes and seconds.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
You can apply a report filter in the Professional and Enterprise versions: a.) from the command line or in Scheduler with the -f option b.) for all reports in a profile in Config/Report Options c.) for all reports in a profile in Reports via the Filters window d.) per report in Config/Reports e.) per report element in Config/Reports
Below are the filters to use when generating a report, it filters out all data not matching this expression, so only part of the data is reported. This can be used in the Report Filter field of a report or a report element, when editing a report, it can be used from the command line. The value of this option is an expression using a subset of the Salang: The Sawmill Language syntax (see examples below). Only a subset of the language is available for this option. Specifically, the option can use:
q
within: e.g. "(page within '/directory')" or "(date_time within '__/Jan/2004 __:__:__')" <, >, <=, >=: for date/time field only, e.g. "(date_time < '01/Jan/2004 00:00:00')" and: between any two expressions to perform the boolean "and" of those expressions or: between any two expressions to perform the boolean "or" of those expressions not: before any expression to perform the boolean "not" of that expression matches: wildcard matching, e.g. "(page matches '/index.*')" matches_regexp: regular expression matching, e.g. "(page matches_regexp '^/index\\\\..*\\')"
Date/time values are always in the format dd/mmm/yyyy hh:mm:ss; underscores are used as wildcards, to match any value in that position. For instance, '15/Feb/2003 __:__:__' refers to a single day, and '__/Feb/2003 __:__:__' refers to a month, a '__/___/2003 __:__: __' refers to a year.
Examples
NOTE: To use these examples in a command line, or in the Extra Options of the Scheduler, use -f "filter" where filter is one of the examples below, e.g., -f "(date_time within '__/Feb/2005 __:__:__')" Use double quotes (") around the entire filter expression; use single quotes (') within the filter if necessary.
Example: To show only events from February, 2005 (but it's easier to use date filters for this (Using Date Filters)): (date_time within '__/Feb/2005 __:__:__') Example: To show only events within the page directory /picts/: (page within '/picts/') Example: To show only events from February, 2004, and within the page directory /picts/: ((date_time within '__/Jan/2004 __:__:__') and (page within '/picts/')) Example: To show only events from last month (the previous full month, i.e. the 1st through end-of-month, in the month containing the second 30 days before now) (but it's easier to use date filters for this (Using Date Filters)): (date_time within ("__/" . substr(epoc_to_date_time(now() - 30*24*60*60), 3, 8) . " __:__:__")) Example: To show only events from last month (more sophisticated than the one above, this works any time in the month, for any month, by computing the first second of the current month, subtracting one second to get the last second of the previous month, and using that to compute a filter for the previous month) (but it's easier to use date filters for this (Using Date Filters)): (date_time within ("__/" . substr(epoc_to_date_time(date_time_to_epoc("01/" . substr(epoc_to_date_time(now()), 3, 8) . " 00:00:00") - 1), 3, 8) . " __:__:__")) Example: To show only events from February 4, 2004 through February 10, 2004 (but it's easier to use date filters for this (Using Date Filters)): ((date_time >= '04/Jan/2004 00:00:00') and (date_time <'11/Jan/2004 00:00:00')) Example: To show only events in the past 30 days (but it's easier to use date filters for this (Using Date Filters)): (date_time >= date_time_to_epoc(substr(epoc_to_date_time(now() - 6000*24*60*60), 0, 11) . " __:__:__")) Example: To show only events with source port ending with 00: (source_port matches '*00') Example: To show only events with source port ending with 00, or with destination port not ending in 00: ((source_port matches '*00') or not (destination_port matches '*00')) Example: To show only events with server_response 404, and on pages whose names contain three consecutive digits: ((server_response within '404') and (page matches_regexp '[0-9][0-9][0-9]')) Example: To show only events with more than 100 in a numerical "bytes" field (this works only for numerical, aggregating fields): (bytes > 100) Advanced Example: Arbitrary Salang expressions may appear in comparison filters, which makes it possible to create very sophisticated expressions to do any type of date filtering. For instance, the following filter expression selects everything in the current week (starting on the Sunday before today). It does this by pulling in some Salang utility functions to compute the weekday from the date, and the month name from the month number, and then iterating backward one day at a time until it reaches a day where the
weekday is Su (sunday). Then it uses that date to construct a date_time value in the standard format "dd/mmm/yyyy hh:mm:dd" which is then used in the filter. date_time >= ( include 'templates.shared.util.date_time.get_weekday'; include 'templates.shared.util.date_time.get_month_as_number'; int t = now(); string weekday = ''; while (weekday ne 'Su') ( bool m = matches_regular_expression(epoc_to_date_time(t), '^([0-9]+)/([A-Za-z]+)/([0-9]+) '); int day = 1; int month = get_month_as_number(epoc_to_date_time(now())); int year = 3; weekday = get_weekday(year, month, day); t -= 24*60*60; ); t += 24*60*60; string first_second_of_week = date_time_to_epoc(substr(epoc_to_date_time(t), 0, 11) . ' 00:00:00'); first_second_of_week; )
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Show only table rows with more than 20 visitors cell_by_name('visitors') > 20 Show only table rows where the page names contain the string "product", with 20 or more page_views and with 10 or more visitors. This contains: cell_by_name('page), 'product') and (cell_by_name('page_views') > 20) and (cell_by_name('visitors') > 10)
the Salang syntax the cell_by_name() or cell_by_row_number_and_name() subroutine to get the value form other table cells the global variables row_number and number_of_days
cell_by_name() returns the value of the table cell with the given report field ID (each report field is identified by an ID, the ID is equal to the report field node name). For instance: (cell_by_name('hits') * cell_by_name('page_views') cell_by_row_number_and_name(row_number, report field ID)
q q
cell_by_row_number_and_name returns the value of the table cell with the given row_number and report field ID. The row_number argument allows you to get the table cell value from other rows, i.e. the previous row or next row.
The global variables row_number and number_of_days These two variable may be useful for very specific expression requirements. row_number in an integer of the current processed row. It is useful when using cell_by_row_number_and_name() to get the row number of a previous or next row. q number_of_days is an integer of the current days in the filtered date range.
q q
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Using Sawmill
This page provides simple instructions and guidelines for using Sawmill, especially for the first time. Common uses are described with examples and more indepth links for creating filters, reports and power user techniques.
Creating a profile
When you create a profile, Sawmill asks you for the log source. The log data can come from a file, a collection of files, or from the output of an arbitrary command (this last option is available only on UNIX and Windows). The files can be local, or Sawmill can download them from an FTP site (you can use a standard FTP URL to specify the location of the files). They can also come from a single file downloaded by HTTP. Sawmill supports and automatically detects a wide range of log formats. Once you have told Sawmill where your log data is, Sawmill will create a profile for you using reasonable values for all the options. You can then click "View Reports" to see the reports -- Sawmill will read and process the log data to build its database, and then will display the reports. You could also click "View Config" to customize any of the profile settings. Sawmill's options can be confusing at first because there are so many of them, but you don't need to change any of them if you don't want to. A good way to start is to leave them all alone at first, and just look at the reports, using all the default settings. Once you're familiar with your statistics using the default settings, you can go back and start changing them if necessary. Options you may eventually decide to change especially include the Log Filters, which let you include or exclude log data from your database (see Using Log Filters) and the Reports, which lets you customize existing reports and create new ones. For performance tuning, you may also want to edit the cross-references or the database fields (see CrossReferencing and Simultaneous Filters). You can use the Log Filters, within the View Config menu, to specify a complex set of filters to control exactly which log entries you are interested in, and which you are not. Some common filters are pre-defined, but you may want to enhance the pre-defined filter set with your own filters, or you may want to remove pre-defined filters. See Using Log Filters for more information and examples.
Viewing Reports
You can view reports at any time by clicking View Reports next to the name of a profile in the administrative profile list. You can also switch to the reports when you're editing the profile options by clicking the View Reports link in the upper right. For information on using and navigating the statistics pages, see Reports.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
This profile is named "myprofile", and the first group shown the database group, containing all database options for the profile. Within that group, there are groups for database options ("options") and database tuning ("tuning"). You can edit this file with a text editor to change who the profile does -- all options available in the graphical interface are also available by editing the text file. Some advanced users do most of their profile editing with a text editor, rather than using the graphical interface. Advanced users also often write scripts which edit or create profile files automatically, and then call Sawmill using the command line to use those profiles. Of course, you can still edit the profile from the graphical interface whenever you want, even to make modifications to profiles you have changed with a text editor. However, Sawmill's web browser interface will recreate the profile file using its own formatting, so don't use it if you've added your own comments or changed the text formatting!
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Configuration Options
Sawmill is highly configurable. Global options are stored in the preferences; options specific to a particular profile are stored in each profile file. For a full list of available options, see All Options. Each profile has its own set of options which controls what that profile does; it specifies the location of the log data, the log format, the database parameters, the reports to display, and more. A profile's options can be changed from the web browser interface using simple graphical forms and controls. Options can also be used on The Command Line, or in Configuration Files.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
All Options
Command Line Options
Option name LogAnalysisInfo folder location Description A folder where Sawmill can store profiles and other information
Command-line name: command_line.log_analysis_info_directory Command-line shortcut: laid Internal option used to track sessions in the graphical interface Command-line name: command_line.session_id Command-line shortcut: si The command-line action to perform Command-line name: command_line.action Command-line shortcut: a Generate HTML report files into a folder Command-line name: command_line.generate_html_to_directory Command-line shortcut: ghtd
Session ID
Action
Output directory
Command-line name: command_line.output_directory Command-line shortcut: od The name of a database field Command-line name: command_line.field_name Command-line shortcut: fn The cross-reference table number Command-line name: command_line.cross_reference_table Command-line shortcut: crt Whether to force a rebuild Command-line name: command_line.force_rebuild Command-line shortcut: fr This option controls whether Sawmill starts its built-in web server when it starts (whether it runs in web server mode) Command-line name: command_line.web_server Command-line shortcut: ws Generate a report
Field name
Cross-reference table
Force rebuild
Generate report
Command-line name: command_line.generate_report Command-line shortcut: gr The language to use for report generation Command-line name: command_line.language Command-line shortcut: l
Language
The password required to view the statistics for this profile, passed on the command line Statistics viewing password
Command-line name: command_line.password Command-line shortcut: clp The name of the profile to use for this command.
Profile to use
Command-line name: command_line.profile Command-line shortcut: p The types of command-line output to generate Command-line name: command_line.verbose Command-line shortcut: v Process ID of the master web server thread (used internally) Command-line name: command_line.master_process_id Command-line shortcut: mpi The interface page to display Command-line name: command_line.display_page Command-line shortcut: dp The name of the report to generate Command-line name: command_line.report_name Command-line shortcut: rn The return address of an email message Command-line name: command_line.return_address Command-line shortcut: rna The recipient address of an email message Command-line name: command_line.recipient_address Command-line shortcut: rca The carbon copy address of an email message Command-line name: command_line.cc_address Command-line shortcut: ca The blind carbon copy address of an email message Command-line name: command_line.bcc_address Command-line shortcut: ba The email subject to use when sending a report by email Command-line name: command_line.report_email_subject Command-line shortcut: res Directory of database to merge into this one Command-line name: command_line.merge_database_directory Command-line shortcut: mdd Directory to import a database from, or export a database to Command-line name: command_line.directory Command-line shortcut: d Directory to convert imported database to Command-line name: command_line.destination_directory Command-line shortcut: dd
Master process ID
Display page
Report name
Return address
Recipient address
Cc address
Bcc address
Directory
Destination directory
Command-line name: command_line.filters Command-line shortcut: f The date filter to use when generating a report Command-line name: command_line.date_filter Command-line shortcut: df Zooms to the date which is specified in "Date filter" Command-line name: command_line.date_breakdown Command-line shortcut: db The version to update to, using web update Command-line name: command_line.update_to_version Command-line shortcut: utv Generate HTML report files in a PDF friendly format Command-line name: command_line.generate_pdf_friendly_files Command-line shortcut: gpff This option controls the ending row of a generated or exported report. Command-line name: command_line.ending_row Command-line shortcut: er Specifies the field to sort by, in a generated or exported report Command-line name: command_line.sort_by Command-line shortcut: sb Specifies the direction to sort in, in a generated or exported report Command-line name: command_line.sort_direction Command-line shortcut: sd Adds an average row when used with export_csv_table. Command-line name: command_line.export_average Command-line shortcut: ea Adds a minimum row when used with export_csv_table. Command-line name: command_line.export_min Command-line shortcut: emin Adds a maximum row when used with export_csv_table. Command-line name: command_line.export_max Command-line shortcut: emax Adds a total row when used with export_csv_table. Command-line name: command_line.export_total Command-line shortcut: et Exports a report to the specified filename when used with export_csv_table. Command-line name: command_line.output_file Command-line shortcut: of The format of an exported report. Command-line name: command_line.output_format_type Command-line shortcut: oft Specifies the end of line marker in a CSV file. Command-line name: command_line.end_of_line Command-line shortcut: eol
Date filter
Date breakdown
Update to version
Ending row
Sort by
Sort direction
Output filename
End of line
Command-line name: command_line.sql_query Command-line shortcut: sq This specifies whether to resolve SQL itemnums when displaying the results of a command-line SQL query Command-line name: command_line.resolve_sql_itemnums Command-line shortcut: rsi This the hostname to bind to, when running as a parsing server
Command-line name: command_line.parsing_server_hostname Command-line shortcut: psh This the port to bind to, when running as a parsing server Command-line name: command_line.parsing_server_port Command-line shortcut: psp
Preferences
Option name Never look up IP numbers using domain name server Description Whether to ever try to look up hostnames of IP-numbered hosts
Command-line name: preferences.miscellaneous.never_look_up_ip_numbers Command-line shortcut: nluin Look up IP numbers only when they appear in logs, not for local server or remote browsing computer Command-line name: preferences.miscellaneous. only_look_up_log_ip_numbers Command-line shortcut: olulin The URL to go to on logout; if empty, goes to login screen Command-line name: preferences.miscellaneous.logout_url Command-line shortcut: lu Amount of time to keep temporary files before deleting them (in seconds) Command-line name: preferences.miscellaneous.temporary_files_lifespan Command-line shortcut: tfl The language module to use to generate language-specific text Command-line name: preferences.miscellaneous.language Command-line shortcut: l The HTML charset to use when displaying pages Command-line name: preferences.miscellaneous.charset Command-line shortcut: c Whether to prompt for Professional/Enterprise switch during trial period Command-line name: preferences.miscellaneous.prompt_for_trial_tier Command-line shortcut: pftt Whether to send information about devices analyzed to Flowerfire Command-line name: preferences.miscellaneous.talkback The SMTP server to use to send email
Logout URL
Language
Charset
SMTP server
Command-line name: preferences.email.smtp_username Command-line shortcut: su The SMTP password to use to send email Command-line name: preferences.email.smtp_password Command-line shortcut: sp The default return address of an email message when sending a report by email from the command line Command-line name: preferences.email.return_address The default recipient address of an email message when sending a report by email from the command line Command-line name: preferences.email.recipient_address The default carbon copy address of an email message when sending a report by email from the command line Command-line name: preferences.email.cc_address The default blind carbon copy address of an email message when sending a report by email from the command line Command-line name: preferences.email.bcc_address The default email subject to use when sending a report by email from the command line
SMTP password
Return address
Recipient address
Cc address
Bcc address
Command-line name: preferences.email.global_support_email_address Command-line shortcut: gsea The address(es) that Sawmill should send email to whenever an action completes; e.g., the database is built Command-line name: preferences.email.global_actions_email_address Command-line shortcut: gaea The return address when email is send upon actions Command-line name: preferences.email.global_actions_return_address Command-line shortcut: gara The number of seconds a Sawmill web session can be active before it is automatically logged out Command-line name: preferences.security.server_session_timeout Command-line shortcut: sst The level of security to use
Session timeout
Security Mode
Command-line name: preferences.security.security_mode Command-line shortcut: sm The hostnames of computers which are "trusted," and do not need to enter passwords Command-line name: preferences.security.trusted_hosts Command-line shortcut: th Show full operating system version details in the text of error messages Command-line name: preferences.security. show_full_operating_system_details_in_errors Command-line shortcut: sfosdie
Trusted hosts
The command line to run to authenticate users Authentication command Command-line name: preferences.security.authentication_command_line line Command-line shortcut: acl The permissions Sawmill uses when creating a file or folder (chmod-style) Default permissions
Command-line name: preferences.security.default_permissions The permissions Sawmill uses when creating a file as part of a database
Database file permissions Database folder permissions Profile file permissions
Command-line name: preferences.security.database_file_permissions The permissions Sawmill uses when creating a folder as part of a database Command-line name: preferences.security.database_directory_permissions The permissions Sawmill uses when creating a profile file Command-line name: preferences.security.profile_permissions The permissions Sawmill uses when creating a folder containing profile files
Command-line name: preferences.security.temporary_profile_permissions The permissions Sawmill uses when creating the default profile file Command-line name: preferences.security.default_profile_permissions The permissions Sawmill uses when creating the password file Command-line name: preferences.security.password_file_permissions The permissions Sawmill uses when creating an image file Command-line name: preferences.security.image_file_permissions The permissions Sawmill uses when creating a folder containing image files
Command-line name: preferences.security.image_directory_permissions The permissions Sawmill uses when creating the server folder
Server folder permissions
Command-line name: preferences.security.server_directory_permissions The permissions Sawmill uses when creating the LogAnalysisInfo folder
LogAnalysisInfo folder permissions
Command-line name: preferences.security. log_analysis_info_directory_permissions Whether to show only the profiles whose names start with the value of REMOTE_USER
Show only the profiles matching REMOTE_USER Command-line name: preferences.security.show_only_remote_user_profiles Command-line shortcut: sorup The value of REMOTE_USER which marks that user as administrator Administrative REMOTE_USER
Command-line name: preferences.security.administrative_remote_user Command-line shortcut: aru True if passwords expire after a specific amount of time Command-line name: preferences.password.password_expires The number of days before passwords expire Command-line name: preferences.password.days_until_password_expiration The minimum number of characters in a password Command-line name: preferences.password.minimum_length
Whether to prevent a users' previous passwords from being reused by that user Prevent use of previous passwords
Command-line name: preferences.password. prevent_use_of_previous_passwords The number of passwords to check when looking for password reuse Command-line name: preferences.password. number_of_previous_passwords_to_check Whether to require a letter (alphabetic character) in passwords Command-line name: preferences.password.requires_letter Whether to require both uppercase and lowercase letters in passwords
Require letter
Command-line name: preferences.password.requires_symbol A folder on the web server running Sawmill as a CGI program, from which images can be served
Temporary folder
Command-line name: preferences.server.temporary_directory_pathname Command-line shortcut: tdp The URL of a folder on the web server running Sawmill as a CGI program, from which images can be served Command-line name: preferences.server.temporary_directory_url Command-line shortcut: tdu The port to listen on as a web server
Command-line name: preferences.server.web_server_port Command-line shortcut: wsp Maximum number of simultaneous connections that Sawmill will accept on its web server Command-line name: preferences.server.maximum_number_of_threads Command-line shortcut: mnot The folder containing Sawmill, relative to the server root Command-line name: preferences.server.cgi_directory Command-line shortcut: cd The IP address to run Sawmill's web server on Command-line name: preferences.server.server_hostname Command-line shortcut: sh
CGI folder
Profile Options
Option name Automatically update database when older than Description Automatically update the database when the statistics are viewed and the database has not been updated in this many seconds
Command-line name: database.options.server.database_name A prefix to add to the beginning of every SQL table name
SQL table name prefix
Command-line name: database.options.server.sql_table_name_prefix A suffix to add to the end of every SQL table name
SQL table name suffix
Command-line name: database.options.server.database_directory Command-line shortcut: dd The socket file to use to access MySQL Command-line name: database.options.server.server_socket The method used to import data in bulk into databases
Server socket
Command-line name: database.options.server.bulk_import_method Command-line shortcut: bim The directory used to store files for loading into the database Command-line name: database.options.server.load_data_directory The directory used by the database server to read temporary files for loading into the database Command-line name: database.options.server.load_data_directory_on_server Specifies the method for splitting SSQL queries Command-line name: database.tuning.split_queries.method Command-line shortcut: sqm Specifies the number of threads for splitting SSQL queries
Load data directory Load data directory on database server Method for splitting SSQL queries
Command-line name: database.tuning.split_queries.number_of_threads Command-line shortcut: not Update/build xref tables Update or build all xref tables automatically, after completing a database build or update after updating/building Command-line name: database.tuning.update_xrefs_on_update database Command-line shortcut: uxou
Build all indices simultaneously after processing log data, for better performance Build all indices simultaneously Build all crossreference tables simultaneously
Command-line name: database.tuning.build_all_indices_simultaneously Command-line shortcut: bais Build all cross-reference tables simultaneously after processing log data, for better performance Command-line name: database.tuning.build_all_xref_tables_simultaneously Command-line shortcut: baxts
Build indices on the fly while log data is read, rather than in a separate stage Build indices during log Command-line name: database.tuning.build_indices_during_log_processing processing Command-line shortcut: bidlp Build cross-reference tables and indices simultaneously Build cross-reference tables and indices simultaneously after processing log data, for better performance
Command-line name: database.tuning. build_xref_tables_and_indices_simultaneously Command-line shortcut: bxtais Build cross-reference tables on the fly while log data is read, rather than in a separate stage Command-line name: database.tuning. build_xref_tables_during_log_processing Command-line shortcut: bxtdlp Build indices in threads, and merge them at the end Command-line name: database.tuning.build_indices_in_threads Command-line shortcut: biit Build indices in memory, rather than using memory-mapped files on disk Command-line name: database.tuning.build_indices_in_memory Command-line shortcut: biim Keep the itemnum tables in memory, rather than keeping them on disk Command-line name: database.tuning.keep_itemnums_in_memory Command-line shortcut: kiim The size of the cache in memory for itemnums, which is used to speed up access to itemnums tables on disk Command-line name: database.tuning.itemnums_cache_size Command-line shortcut: ics The size of the cache in memory for cross-reference, which is used to speed up access to crossreference tables on disk Command-line name: database.tuning.xref_tables_cache_size Command-line shortcut: xtcs Build cross-reference tables in threads, and merge them at the end Command-line name: database.tuning.build_xref_tables_in_threads Command-line shortcut: bxtit Factor by which a hash table expands when necessary Command-line name: database.tuning.hash_table_expansion_factor Command-line shortcut: htef Initial size of a database hash table Command-line name: database.tuning.hash_table_starting_size Command-line shortcut: htss Number of times larger a hash table is than its contents Command-line name: database.tuning.hash_table_surplus_factor Command-line shortcut: htsf Maximum memory used by the list cache Command-line name: database.tuning.list_cache_size Command-line shortcut: lcs
Maximum size of main table segment to merge; larger segments will be copied Maximum main table segment size to merge
Command-line name: database.tuning.maximum_main_table_segment_merge_size Command-line shortcut: mmtsms The maximum size of one segment of main database table Command-line name: database.tuning.maximum_main_table_segment_size Command-line shortcut: mmtss Maximum size of a cross-reference table segment to merge; large segments will be copied Command-line name: database.tuning.maximum_xref_segment_merge_size Command-line shortcut: mxsms The maximum size of one segment of a cross-reference database table
Maximum crossreference table segment Command-line name: database.tuning.maximum_xref_table_segment_size size Command-line shortcut: mxtss This specifies the maximum memory to be used by a paging caching buffer Maximum caching buffer memory usage
Command-line name: database.tuning. maximum_paging_caching_buffer_memory_usage Command-line shortcut: mpcbmu This specifies the maximum size of a file to be loaded fully into memory Command-line name: database.tuning. maximum_paging_caching_buffer_full_load Command-line shortcut: mpcbfl Allow newlines (return or line feed) inside quotes, in log lines Command-line name: log.format.allow_newlines_inside_quotes Command-line shortcut: aniq A string which describes the log format, Apache-style Command-line name: log.format.apache_description_string Command-line shortcut: ads Number of lines to examine for this log format while auto-detecting Command-line name: log.format.autodetect_lines Command-line shortcut: al A regular expression used to auto-detect the log format Command-line name: log.format.autodetect_regular_expression Command-line shortcut: are An expression used to auto-detect the log format Command-line name: log.format.autodetect_expression Command-line shortcut: ae A string which describes the log format, Blue Coat style Command-line name: log.format.blue_coat_description_string Command-line shortcut: bcds Log format is similar to Common Log Format
Autodetect lines
Autodetect expression
Format is Common Log Command-line name: log.format.common_log_format Format Command-line shortcut: clf The character or string that separates one log field from the next Log field separator
Command-line name: log.format.date_format Command-line shortcut: ldf The year to use, e.g., 2004, if the date format in the log data has no year information Command-line name: log.format.default_log_date_year Command-line shortcut: dldy The format of the log data Command-line name: log.format.format_label Command-line shortcut: fl A regular expression which, if matched in the log data, determines the date for all subsequent entries Command-line name: log.format.global_date_regular_expression Command-line shortcut: gdre A regular expression which, if matched in the log filename, determines the date for all entries in that log file Command-line name: log.format.global_date_filename_regular_expression Command-line shortcut: gdfre Ignore format lines in the log data Command-line name: log.format.ignore_format_lines Command-line shortcut: ifl Ignore quotes in log data Command-line name: log.format.ignore_quotes Command-line shortcut: iq Use only the parsing filters to parse the log (and not the log format regexp, index/subindex, etc.) Command-line name: log.format.parse_only_with_filters Command-line shortcut: powf A regular expression describing the log format
Ignore quotes
Log data format regular Command-line name: log.format.parsing_regular_expression expression Command-line shortcut: pre Format of times in the log data Time format
Command-line name: log.format.time_format Command-line shortcut: ltf Treat square brackets as quotes Command-line name: log.format.treat_brackets_as_quotes Command-line shortcut: tbaq Treat apostrophes (') as quotes
Treat apostrophes (') as Command-line name: log.format.treat_apostrophes_as_quotes quotes Command-line shortcut: taaq True if Sawmill should allow databases to be created from log sources which contain no data Allow empty log source
Command-line name: log.processing.allow_empty_log_source Command-line shortcut: aels The number of hours to add to each date in the log file Command-line name: log.processing.date_offset Command-line shortcut: do
Date offset
The year to use, e.g., 2004, if the date format in the log data has no year information. Default log date year
Command-line name: log.processing.default_log_date_year Command-line shortcut: The number of log entries Sawmill can work on simultaneously Command-line name: log.processing.log_entry_pool_size Command-line shortcut: eps Look up geographic locations, from IP addresses, with GeoIP database Command-line name: log.processing.look_up_location_with_geoip Command-line shortcut: Size in bytes of the blocks which are read from the log Command-line name: log.processing.read_block_size Command-line shortcut: rbs Size in bytes of the maximum size of blocks which are read from the log
Command-line name: log.processing.maximum_read_block_size Command-line shortcut: mrbs Use real-time processing mode to allow viewing of report during log processing Real-time (allow viewing of reports during log Command-line name: log.processing.real_time processing) Command-line shortcut:
Method for connecting to and/or spawning parsing servers Parsing server distribution method
Command-line name: log.processing.distributed.method Command-line shortcut: psdm Starting port to use for parsing server Command-line name: log.processing.distributed.starting_port_auto Command-line shortcut: spa The number of local parsing servers to start
Number of local parsing Command-line name: log.processing.distributed.number_of_servers servers Command-line shortcut: nos Distribute log data to parsing servers file-byfile Whether to distribute log data to parsing servers one file at a time
Command-line name: log.processing.distributed.file_by_file Command-line shortcut: fbf Whether to skip the most recent log file in the log source, or process it
Command-line name: log.processing.skip_most_recent_file Command-line shortcut: smrf Skip files which have already been processed (judging by their pathnames) during a database Skip processed files on update or add operation update (by pathname) Command-line name: log.processing.skip_processed_filenames_on_update Command-line shortcut: spfod When checksumming log data for skipping, ignore anything matching this regular expression
Ignore regexp for skip checksum
Command-line name: log.processing.ignore_regexp_for_skip_checksum Command-line shortcut: irfsc The size of data chunks to feed to log processing threads, during a multiprocessor build or update Command-line name: log.processing.thread_data_block_size Command-line shortcut: tdbs
The number of simultaneous threads to use to process log data Log processing threads
Command-line name: log.processing.threads Command-line shortcut: lpt Whether to convert log data into a different charset while processing it Command-line name: log.processing.convert_log_data_charset Command-line shortcut: cldc The charset to convert from, when converting input log data Command-line name: log.processing.convert_log_data_from_charset Command-line shortcut: cldfc The charset to convert to, when converting input log data Command-line name: log.processing.convert_log_data_to_charset Command-line shortcut: cldtc The delimiter that's used between fields when using the "process logs" action Command-line name: log.processing.output.field_delimiter Command-line shortcut: fd Whether to suppress the field names header in "process logs" output Command-line name: log.processing.output.suppress_output_header Command-line shortcut: soh The format of date/time values in "process logs" output Command-line name: log.processing.output.output_date_time_format Command-line shortcut: odtf The value to output for an empty field in "process logs" output Command-line name: log.processing.output.empty_output_value Command-line shortcut: eov The address(es) that Sawmill should send email to whenever an action completes; e.g., the database is built. This option overrides Global actions email address. Command-line name: network.actions_email_address Command-line shortcut: aea The return address when email is send upon actions. This option overrides Global actions return address. Command-line name: network.actions_return_address Command-line shortcut: ara The hostname or IP address of the DNS server to use to look up IP addresses in the log data
DNS server
Command-line name: network.dns_server Command-line shortcut: ds Amount of time to wait for DNS response before timing out Command-line name: network.dns_timeout Command-line shortcut: dt The file in which to cache IP numbers after they're looked up Command-line name: network.ip_numbers_cache_file Command-line shortcut: incf Whether to look up IP numbers using a domain nameserver (DNS), to try to compute their hostnames Command-line name: network.look_up_ip_numbers Command-line shortcut: luin
DNS timeout
Whether to look up IP numbers before filtering (rather than after). Look up IP numbers before filtering
Command-line name: network.look_up_ip_numbers_before_filtering Command-line shortcut: luinbf The maximum number of IP addresses that Sawmill will attempt to lookup at the same time
Maximum simultaneous Command-line name: network.maximum_simultaneous_dns_lookups DNS lookups Command-line shortcut: msdl The URL of a running version of Sawmill, used to insert live links into HTML email Running Sawmill URL
Command-line name: network.running_statistics_url Command-line shortcut: rsu The hostname or IP address of the DNS server to use to look up IP addresses in the log data, if the primary DNS server fails Command-line name: network.secondary_dns_server Command-line shortcut: sds The email address where bug and error reports should be sent. This option overrides Global support email address.
Command-line name: network.support_email_address Command-line shortcut: sea True if Sawmill should use TCP (rather than the more standard UDP) to communicate with DNS servers Command-line name: network.use_tcp_for_dns Command-line shortcut: utfd A divider to separate thousands in displayed numbers Command-line name: output.number_thousands_divider Command-line shortcut: ntd A divider to separate the integer part from the decimal (fractional) part in displayed numbers Command-line name: output.number_decimal_divider Command-line shortcut: ndd The number of seconds between progress pages
Number of seconds between progress pages Command-line name: output.progress_page_interval Command-line shortcut: ppi Use base 10 for byte displays Use base 10 (multiples of 1000, e.g., megabytes) for byte displays, rather than base 2 (multiples of 1024, e.g., mebibytes)
Command-line name: output.use_base_ten_for_byte_display Command-line shortcut: ubtfbd Whether to perform charset conversion when exporting CSV data Command-line name: output.convert_export_charset Command-line shortcut: cec The charset to convert from, when converting a final exported CSV file Command-line name: output.convert_export_from_charset Command-line shortcut: cefc The charset to convert to, when converting a final exported CSV file Command-line name: output.convert_export_to_charset Command-line shortcut: cetc Allow all statistics viewers to rebuild/update the database
Allow viewers to rebuild/ Command-line name: security.allow_viewers_to_rebuild update database Command-line shortcut: avtr
True if reports should be cached for faster repeat display Cache reports
Command-line name: statistics.miscellaneous.cache_reports Command-line shortcut: cr The word to use to describe a log entry Command-line name: statistics.miscellaneous.entry_name Command-line shortcut: en The first weekday of the week (1=Sunday, 2=Monday, ...) Command-line name: statistics.miscellaneous.first_weekday Command-line shortcut: fw The URL to link view buttons to when the views are not visible Command-line name: statistics.miscellaneous.hidden_views_url Command-line shortcut: hvu The weekday which appears marked in calendar months displays (1=Sunday, 2=Monday, ...) Command-line name: statistics.miscellaneous.marked_weekday Command-line shortcut: mw The maximum duration of a session; longer sessions are discarded from the session information Command-line name: statistics.miscellaneous.maximum_session_duration Command-line shortcut: msd HTML code to place at the bottom of statistics pages Command-line name: statistics.miscellaneous.page_footer Command-line shortcut: pf An HTML file whose contents go at the bottom of statistics pages Command-line name: statistics.miscellaneous.page_footer_file Command-line shortcut: pff HTML code to place at the top of the statistics pages Command-line name: statistics.miscellaneous.page_header Command-line shortcut: ph An HTML file whose contents go at the top of the statistics pages Command-line name: statistics.miscellaneous.page_header_file Command-line shortcut: phf A command which is executed to generate HTML to frame Sawmill's statistics Command-line name: statistics.miscellaneous.page_frame_command Command-line shortcut: pfc Shows table items which start with "http://" as a link. Command-line name: statistics.miscellaneous.show_http_link Command-line shortcut: shl The root URL of the server being analyzed Command-line name: statistics.miscellaneous.server_root Command-line shortcut: sr The interval after which events from the same user are considered to be part of a new session Command-line name: statistics.miscellaneous.session_timeout Command-line shortcut: st Specifies the target user agent when sending emails. Command-line name: statistics.miscellaneous.user_agent_for_emails Command-line shortcut: uafe
First weekday
Marked weekday
Footer text
Footer file
Header text
Header file
Specifies the target user agent when generating report files. User agent for report files
Command-line name: statistics.miscellaneous.user_agent_for_files Command-line shortcut: uaff The delimiter to use between fields in a CSV export Command-line name: statistics.miscellaneous.csv_delimiter Command-line shortcut: cds True if Overview numbers should be used for total rows Command-line name: statistics.miscellaneous.use_overview_for_totals Command-line shortcut: uoft Specifies the maximum number of characters per table item per line. Command-line name: statistics.sizes.table_cell. maximum_continuous_text_length Command-line shortcut: mctl Specifies the minimum number of characters to break the last table item line. Command-line name: statistics.sizes.table_cell. maximum_continuous_text_length_offset Command-line shortcut: mctlo Specifies the maximum number of characters per table item. Command-line name: statistics.sizes.table_cell.maximum_text_length Command-line shortcut: mtl Specifies the maximum number of characters per line in a session path and path through a page report.. Command-line name: statistics.sizes.session_path. maximum_continuous_text_length Command-line shortcut: spmctl Specifies the minimum number of characters to break the last line in a session path and path through a page report. Command-line name: statistics.sizes.session_path. maximum_continuous_text_length_offset Command-line shortcut: spmctlo Specifies the maximum number of characters of page names in the session path and path through a page report. Command-line name: statistics.sizes.session_path.maximum_text_length Command-line shortcut: spmtl
CSV delimiter
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Configuration Files
Note: you can usually avoid ever dealing with profile files by using Sawmill through the web browser interface. You only need to know about profile files if you want to edit them directly (which is usually faster than using the web interface), or use them from the command line, or if you need to change options which are not available through the web interface. All Sawmill options are stored in text files called configuration files (or profile files if they contain the options of a particular profile). Configuration files use a very simple format; each option is give in the format name = value and options can be grouped like this: groupname = { name1 = value1 name2 = value2 name3 = value3 } # groupname Within this group, you can refer to the second value using the syntax groupname.name2. Groups can be within groups like this: groupname = { name1 = value1 subgroupname = { subname1 = subvalue1 subname2 = subvalue2 } # subgroupname name3 = value3 } # groupname Hash characters (#) are comment markers; everything after a # is ignored, until the end of the line. Multiline comments can be created using {# before the command and #} after it. In this case, the subgroup name has been listed as a comment on the closing bracket; this is customary, and improves legibility, but is not required. In this case, subvalue2 can be referred to as groupname.subgroupname.subname. There are no limits to the number of levels, the number of values per level, the length of names or labels, or anything else. In addition to groupings within a file, groupings also follow the directory structure on the disk. The LogAnalysisInfo folder is the root of the configuration hierarchy, and files and directories within it function exactly as though they were curly-bracket groups like the ones above. For instance, the preferences.cfg file (cfg stands for configuration group) can be referred to as preferences; the server group within preferences.cfg can be referred to as preferences.server, and the web_server_port option within the server group can be referred to as preferences.server.web_server_port. So for instance, in a web server command line you can change the default port like this: sawmill -ws t -preferences.server.web_server_port 8111 If you happen to know that the shortcut for web_server_port is wsp, you could also shorten this: sawmill -ws t -wsp 8111 (-ws t is required to tell Sawmill to start its server). Through this type of hierarchical grouping by directories within LogAnalysisInfo, and by curly-bracket groups within each
configuration file, all configuration options in the entire hierarchy can be uniquely specified by a sequence of group names, separated by dots, and ending with an option name. All options in Sawmill are specified in this way, including profile options, preferences, language module (translation) variables, users, scheduling options, documentation, spider/worm/search engines information, command line and internal options, and more. Sawmill creates a profile file in the profiles subfolder of the Sawmill folder when you create a profile from the graphical interface (see The Administrative Menu). Profile files can also be created by hand using a text editor, though the large number of options makes this a difficult task to do manually -- it is best scripted, or done by copying an existing profile file and editing it. To use files as profile files, you will need to put them in the profiles folder. Any profile which can be specified in a profile file can also be specified in The Command Line by using the same profile options. Command line syntax is longer if full profile names are used, because each option on the command line must be specified using the full group1.group2.group3.option, when in the profile it appears only as option (within the groups). However, most options have shortcuts which can be used instead; see the option documentation for each option's shortcut (All Options), To see a sample profile file, use the web browser interface to create a profile, and then examine the file in the profile folder.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
When you're done setting up the create_many_profiles.cfg file, you can create all the profiles in a single step with the following command line: sawmill -dp util.create_many_profiles and the following command line (Windows): Sawmill.exe -dp templates.admin.profiles.create_many_profiles This will create all the profiles you've specified, and apply the changes you indicated to them, and save them to the profiles list. In the future, if you want to make a change to affect all profiles, Modify the template profile, and rerun the command above to recreate the profiles.
sawmill -scheduler Or you can leave a copy of Sawmill running in web server mode as well, just to handle the schedules.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Operators
Operator == != <= >= < > eq ne le ge lt gt or and + * % / += -= ++ -. .= Purpose Compares two numbers; true if they are equal; e.g. 1 == 1 is true. Compares two numbers; true if they are not equal; e.g. 1 != 1 is false. Compares two numbers; true if the left number is less than or equal to the right; e.g. 1 <= 2 is true, and so is 1 <= 1. Compares two numbers; true if the left number is greater than or equal to the right; e.g. 2 >= 1 is true, and so is 1 >= 1. Compares two numbers; true if the left number is less than the right; e.g. 1 < 2 is true, but 1 < 1 is false. Compares two numbers; true if the left number is greater than the right; e.g. 2 > 1 is true, but 1 > 1 is false. Compares two strings; true if they are equal; e.g. "a" eq "a" is true. Compares two strings; true if they are not equal; e.g. "a" ne "a" is false. Compares two strings; true if the left string is lexically less than or equal to the right; e.g. "a" le "b" is true, and so is "a" le "a". Compares two strings; true if the left string is lexically greater than or equal to the right; e.g. "b" ge "a" is true, and so is "a" ge "a". Compares two strings; true if the left string is lexically less than the right; e.g. "a" lt "b" is true, but "a" lt "a" is false. Compares two strings; true if the left string is lexically greater than the right; e.g. "b" gt "a" is true, but "a" gt "a" is false. True if either left or right values, or both, are true; e.g. true or true is true; true or false is true. True if both left and right values are true; e.g. true and true is true; true and false is false. Adds the right value to the left value; e.g. 1+2 is 3. Subtracts the right value from the left value; e.g. 2-1 is 1. Multiplies the right value and the left value; e.g. 2*3 is 6. Modulo (division remainder) operation on the left value by the right value; e.g. 5%2 is 1 and 6%2 is 0. Divides the left value by the right value; e.g. 12/4 is 3. Adds the right value numerically to the left variable; e.g. x += 1 adds 1 to x. Subtracts the right value numerically from the left variable; e.g. x -= 1 subtracts 1 from x. Adds 1 numerically to the left variable; e.g. x++ adds 1 to x. Subtracts 1 numerically from the left variable; e.g. x-- subtracts 1 from x. Concatenates the right string to the end of the left string; e.g. "a"."b" is "ab". Concatenates the right value to the left variable; e.g. x .= "X" concatenates "X" to the end of x.
= ! not ? @ {}
Assigns the right hand side to the left hand side; e.g. x = 1 assigns a value of 1 to the variable x. Performs a boolean negation ("not" operation) of its unary parameter; e.g. !true is false, and !false is true. Same as !. Checks for node existence of its unary parameter; e.g. ?"n" true if the node "n" exists. ?"n" is exactly equivalent to node_exists("n"); this is a syntactic shortcut. Returns the value of the node which is its unary parameter; e.g., @n is the value of the node in the variable n. @n is exactly equivalent to node_value(n); this is syntactic shortcut. This is used as n{s}, and returns the subnode of n whose name is s. n{s} is exactly equivalent to subnode_by_name(n, s); this is syntactic shortcut. Any Salang expression enclosed in {= =} is automatically expanded in several cases: When they are in the Extra Options section of the Scheduler, when they are on the command line,and when they are the "pathname" field of a local log source. This is used as n[i], and returns the ith subnode of n. n{i} is exactly equivalent to subnode_by_number(n, i); this is syntactic shortcut. This is used as n?{s}, and returns true if there is a subnode of n whose name is s. n?{s} is exactly equivalent to subnode_exists(n, s); this is syntactic shortcut. True if the left value matches the wildcard pattern specified by the right value. Used in report filtering; true if the value of the database field specified by the left value is within the value specified by the right value; e.g. "page within '/directory/'". Treats its unary string parameter as a variable name, and evaluates the value of the variable; e.g. if the value of the variable named "somevariable" is 1, then the value of the expression $("somevariable") is 1. Important: this uses the value of the expression immediately after it as the name of the variable, so if variable x has value "valueX" then $x means the same as $("valueX"); i.e. it is the value of the variable valueX, not the value of the variable x. To get the value of the variable x, just use x, not $x.
{= =}
[] ?{} matches
matches_regexp True if the left value matches the regular expression specified by the right value. within
above is searched to see if they are present. If they are, they are automatically loaded into memory. It is therefore possible to treat on-disk nodes as though they were in memory, and refer to them without any explicit loading operation. Variables and other data are stored in the configuration hierarchy, a collection of data which exists on disk and partially in memory. The configuration hierarchy is a group of "nodes", each with a single value. Any node can also contain other nodes (subnodes). Each node has a type ("bool", "int", "float", "string", "table", "data", or "node"), and a value. Salang expressions can get values from the hierarchy (e.g. "internal.verbose" is an expression which gets the value of the "verbose" subnode of the "internal" node), or set them (e.g. "internal.verbose = 1;"). Configuration nodes are referenced using a dot-separated (.) sequence of names, which functions like a pathname. Configuration nodes are grouped hierchically either using subfolders, or using .cfg files (cfg stands for "configuration group"). . cfg files have additional hierarchical groupings within them, specified by curly-bracketed groups. For instance, the node name "rewrite_rules" refers to the rewrite_rules folder of LogAnalysisInfo, and the node name "rewrite_rules.server_responses" refers to the server_responses.cfg file, in the rewrite_rules folder of LogAnalysisInfo. The server_responses.cfg file looks like this: server_responses = { 100 = { regexp = "100" result = "Continue" } ... 200 = { regexp = "200" result = "Successful Transfer" } ... } # server_response (elipses [...] above indicate places where part of the file has been omitted for brevity), so there is a "100" node inside the server_responses node, and there is also a "200" node there. Therefore, the node name "rewrite_rules. server_responses.100" refers to the 100 node, and "rewrite_rules.server_responses.200" refers to the 200 node. Within the 100 node, there are two nodes, "regexp" and "result"; so "rewrite_rules.server_responses.100.regexp" refers to the regexp node in 100, and "rewrite_rules.server_responses.200.regexp" refers to the regexp node in 200. Directories are treated equivalently to .cfg files in the configuration hierarchy. In both cases, they represent nodes, and can contain other nodes. In the case of directories, subnodes appear as subdirectories, or as .cfg files in the directory, or as .cfv files in the directory; in the case of .cfg files, subnodes appear within the file using curly-bracket syntax, as above. As mentioned above, every node has a value. In the example above, the value of rewrite_rules.server_responses.200.regexp is "100", and the value of rewrite_rules.server_responses.100.result is "Continue". Even group nodes like rewrite_rules. server_responses.100, rewrite_rules.server_responses, and rewrite_rules can have values, but their values are typically empty (""); in practice, a node has either a value, or subnodes. Nodes can be used like variables (they are variable). Nodes can be referred to by name, in log filters or other Salang code; for instance this: echo(rewrite_rules.server_responses.100.result) will print the value "Continue" to the standard output stream. They can also be assigned values: rewrite_rules.server_responses.100.result = "Something Else" Nodes are read from disk into memory automatically when they are accessed, and the in-memory version is used for the remainder of the process, and then discarded. Therefore, it is not generally necessary to read data from disk explicitly; instead, put it in a node and access it by name. In the example above, the value of rewrite_rules.server_responses.100.result will be temporarily changed to "Something Else"; only the in-memory version of rewrite_rules.server_responses.100.result will change, and the change will not be propagated to disk unless save_node() is used to write out the changes back out to disk. .cvf files (cfv stands for "configuration value") are straight text files whose entire content is considered to be the string value of
the node. For example, the file LogAnalysisInfo/somedir/somefile.cfv whose contents is "HELLO" could be referred to as somedir.somefile; the value of that node is "HELLO". The node built-in type of Salang refers to a configuration node; it is a pointer to that node. You can get the value of the node using node_value(). You can also get a particular subnode using subnode_by_name(). You can check for the existence of a subnode using subnode_exists() (or use node_exists() with the full nodename as the parameter). In addition to the LogAnalysisInfo directory, Salang also looks in other "search paths" to find nodes specified by name:
q
If the node name begins with lang_, it will search for it in LogAnalysisInfo/language/{current} where {current} is the current language. For instance, if the current language is "english", then the node "lang_stats.menu.groups. date_time_group" refers to the date_time_group node, in the groups node, in the menu node, in the LogAnalysisInfo/ languages/english/lang_stats.cfg. If a profile is active (e.g., was specified with -p on the command line), it will search for the node in the profile .cfg file. For instance, if Sawmill is called with -p myprofile, then the node "database.fields.date_time" refers to the date_time database field, and "database_fields.date_time.label" refers to the label fo the date_time database field. In log filters, if the node name is exactly the name of a log field, then the log field value is used. This is not actually a node, but functions as one for the purpose of reading and writing its value; you can refer to it like a normal Salang variable in most cases (e.g., "file_type = 'GIF'"). If a local variable has been declared in Salang, it can be referred to by name. For instance, you can write: int i; i = 3; echo(i); to define and refer to a local variable node i. Local variables do not have full nodenames; they are not "rooted" in LogAnalysisInfo like most other nodes. They are free-floating in memory, and cannot be saved to disk; they can only be referred to by their short nodenames (e.g., "i"). Inside subroutines, the subroutine parameters can be referred to by name. For instance, the following subroutine, which adds two numbers, refers to its parameters x and y directly: subroutine(add(int x, int y), ( x + y )); Subroutine parameters are similar to local variables (above) in that they are not rooted, and cannot be written to disk.
The special variables $0, $1, $2, etc. refer to the matched subexpressions of the last call to matches_regular_expression().
Type data
Nodes of type data represent chunks of binary data in memory. They can be assigned to and from strings, if they do not contain null characters. They are currently used only in networking, to read data from a socket, or write it to a socket.
to log fields by name, so a reference to date_time in a log filter is a reference to the value of the date_time field in the log entry that is currently being processed. This can be used either to get or set values; e.g. "if (page eq '/index.html') then 'reject'" checks the current log entry's page field to see if it's "/index.html", and rejects the log entry if it is; and "page = '/index.html'" sets the page field of the current log entry to "/index.html". Log filters can also use the special function current_log_line(), whose value is the entire current line of log data.
Types
Each configuration node has a type, which specifies how its data is stored internally. Possible types are:
q q q q q
string: The data is an arbitrary string value like "hello" or "Bob". int: The data is an integer value like 15 or 49. float: The data is a floating point value like 1.4 or 18874.46567. bool: The data is a boolean value (true or false). node: The data is a reference (pointer) to a node in the configuration hierarchy.
Types are primarily useful for performance optimization. Salang is a weakly typed language, so it will allow any value to be assigned to any variable without warning or complaint; for instance assigning 134 to a string variable will result in "134" in the string variable, or assigning "134.5" to a float variable will result in the floating point value of 134.5 in the float variable. However, these types of conversions can be slow if they are performed many times, so if you know a variable is only going to hold and manipulate floating point values, you should use a float type rather than a string type. The node type is particularly useful for performance optimization, since it prevents the need for node lookups; e.g. setting the value of a node explicitly with "a.b.c.d = 0;" requires a series of expensive lookups, as node "a" is looked up in the configuration root, and then "b" inside that, and then "c", and then "d"-- but if a node variable N already points to a.b.c.d, then "N = 0;" is a very fast operation that does the same thing. node variables are particularly useful for foreach loops, and functions like subnode_value(), where they can be used to iterate over all subnodes without requiring any expensive string-tonode lookups.
Statements
if A then B else C This statement evaluates the A section; if the value of A is true, then the value of the entire statement is the B section; otherwise, the value of the statement is the C section. subroutine(A(param1, param2, ..., paramN), B) This statement defines a subroutine A with N parameters. The subroutine can be called with the statement "A(p1, p2, ..., pN)". The value of the subroutine call will be the value of B, with the parameters p1...pN plugged into the variables $param1.. $paramN before B is evaluated. The value of a subroutine declaration is empty. foreach I N B This statement is used to iterator over the subnodes of a particular node in the Salang hierarchy. It iterates over all values of node N, setting I to the full nodepath each iteration, and then evaluating expression B. The value of this expression is the concatenation of the values of all the evaluations of B. for (I; C; E) This repeats an expression 0 or more times. The expression I is evaluated to initialize the loop. The expression C is evaluated, before each iteration, and if the result is false, then the iteration does not complete. If C is true, the E is evaluated. The value of this expression is the concatenation of the E values. while (C) E; This repeats an expression 0 or more times. The expression C is evaluated before each iteration, and if the result is true, then E is evaluated. This continues until C is false. The value of this expression is the concatenation of the E values. next This statement goes immediately to the next iteration of the immediately enclosing loop.
last This statement immediately terminates execution of the immediately enclosing loop, and continues execution after the end of the loop.
Built-in subroutines
get_file_type_from_url(string U) The value of this statement is the file extension of the URL U; e.g. get_file_type_from_url("http://something/file.gif") is "GIF". get_log_field(string N) Returns the value of the log field N, of the current log entry. node_exists(string N) Returns true if the node specified by the nodename N exists; false if the node does not exist. node_name(node N) Returns the name of the node N. num_subnodes(node N) Returns the number of subnodes of the node specified by nodename N. subnode_by_number(node N, int I) Returns the Ith subnode of node N. subnode_by_name(node N, string M) Returns the subnode of node N whose name is M. If there is no subnode by that name, it creates one. subnode_exists(node N, string M) Returns true if there is a subnode named M in node N, false otherwise. set_subnode_value(node N, string M, anytype V) Sets the subnode named M of node N to the value V. Creates the subnode if it does not exist. node_value(node N) Returns the value of node N. set_node_value(node N, anytype V) Sets the value of node N to V; no value is returned. node_type(node N) Returns the type of node N as a string (e.g. "int", "float", "bool", "string", or "node"). set_node_type(node N, string T) Sets the type of node N to T ("int", "float", "bool", "string", or "node"). delete_node(node N) Deletes the node specified by nodename N. If the node is a profile, this also deletes the profile file. insert_node(node P, string N, int I) Inserts the node specified by nodename N into the node specified by nodename P, so that N ends up as the Ith subnode of P (i.e. it inserts N into P at position I). N is removed from its current location, and moved to the new one, so for instance if N is a subnode of O before this is called, it will no longer be a subnode of O afterwards. I.e. it moves N, rather than copying it. clone_node(node original_node, string clone_nodepath) This makes a clone of the node original_node, and puts it at the node specified by clone_nodepath (which is created if it doesn't exist). If no second (clone_nodepath) parameter is specified, this returns the clone node. The original node is unchanged, and after completion, the clone is an exact replicate of the original, except the name (which is computed from clone_nodepath). This returns an empty value if clone_nodepath is specified, or the clone node clone_nodepath is not
specified. overlay_node(node node1, node node2) This overlays node2 on node1. After it's done, node1 will have all the subitems it had before, plus any that were in node2. If there are subnodes in both, the one from node2 will replace the original one. new_node() This creates a new node, and returns it. The node is unrooted; it has no parent. delete_node(node n) This deletes the node n (frees the memory is uses). The node should no longer be used after this call. rename_node(node thenode, string newname) This renames the node thenode, so its name is newname. node_parent(node thenode) This returns the parent node of thenode, or NULL if thenode has no parent. add_subnode(node parent, node subnode) This add subnode as a subnode of parent. sort(node N, string M, bool E) Sorts the subnodes of the node specified by nodename N. The subnodes are sorted in an order specified by method M. M is a string of the format "field:F,T,A", where F is a field name (the name of a subnode found in each subnode of N; use "value" to use the value of each subnode of N directly), T is "integer", "float", "alphabetical", or "chronological" (which determines the sort type), and A is "ascending" or "descending" (which determine the direction of sort). If E is true, then variable are expanded before sorting; if E is false, then the sort is done on the literal values, without variable expansion. create_profile(string N) This creates a new profile with name N. It pulls various variables from the node hierarchy to set up the profile, the nodes are set by the "create profile" interview. format(anytype V, string T) This formats the value V according to the format type T. T can be "integer", "page", "float", "two_digit_fixed", "bandwidth", "date_time", "hostname", "duration", "duration_compact", or "duration_milliseconds". If the first character of T is a % character, the result will be formatted as a double-precision floating point number using printf-style formatting, e.g. '%.1f' will print the value in floating point format with one digit after the decimal place. image(string N, int W, int H) The value of this subroutine is an HTML image tag which displays the image specified by filename N, with width W and height H. The image file should be in the temporary directory in CGI mode, or in the picts folder, which is in the WebServeRoot folder of the LogAnalysisInfo folder, in web server mode. create_image(int width, int height) This creates a new image canvas of the specified image and height, and returns the image ID for use in drawing functions. It returns the image ID for use in drawing functions. allocate_image_color(string imageid, int red, int green, int blue) This allocates a new color for use in drawing on image imageid (created by create_image()), with the specified red, green, and blue components (0 to 255). It returns the color value for use in drawing functions. add_text_to_image(string imageid, string color, int x, int y, string text, string direction) This draws text on image imageid (created by create_image()), with the specified color (created with allocate_image_color()) at the specified x and y position, in the direction specified by direction ("up" or "right"). The upper left corner of the text will be anchored at (x, y) when drawing horizontally; the lower left corner will be anchored there when drawing vertically. add_line_to_image(string imageid, string color, int x1, int y1, int x2, int y2) This draws a line on image imageid (created by create_image()), with the specified color (created with allocate_image_color()) from the point (x1, y1) to the point (x2, y2).
add_polygon_to_image(string imageid, string color, node points, bool filled) This draws a polygon on image imageid (created by create_image()), with the specified color (created with allocate_image_color()). The vertices of the polygon are specified in order in the subnodes of points; each subnode must have and x and a y value specifying the location of that point. The polygon will be a single pixel border if filled is false, or filled if it is true. add_rectangle_to_image(string imageid, string color, int xmin, int ymin, int xmax, int ymax, bool filled) This draws a rectangle on image imageid (created by create_image()), with the specified color (created with allocate_image_color()) and minimum x/y dimensions. The rectangle will be a single pixel border if filled is false, or filled if it is true. write_image_to_disk(string imageid, string pathname, string format) This writes the image imageid (created by create_image()) to disk as an image file (GIF or PNG, as specified by format, whose value can be "GIF" or "PNG"), to the location specified by pathname. int read_image_from_disk(string pathname) This reads an image from disk, from the file specified by pathname. The format, which may be PNG or GIF, is inferred from the filename extension. This returns the image ID of the image read. int get_image_width(string imageid) This returns the width, in pixels, of the image imageid. int get_image_height(string imageid) This returns the height, in pixels, of the image imageid. int get_image_pixel(string imageid, int x, int y) This returns the pixel at position x, y, in the image imageid. The pixel is represented as a 32-bit integer: top 8 bits are 0; next 8 bits represent red; next 8 bit represent green, and next 8 bits represent blue. int get_image_pixel_transparency(string imageid, int x, int y) This returns the transparency of the pixel at position x, y, in the image imageid. The returned value is between 0 (transparent) and 255 (opaque). img_src(string N) The value of this subroutine is the value of an HTML src section of an image tag; i.e. it's intended to be used right after src= in an image tag. The image file should be in the temporary directory in CGI more, or in the picts folder, which is in the WebServeRoot folder of the LogAnalysisInfo folder, in web server mode. fileref(string F) This converts a local filename from the WebServerRoot directory to a URL that refers to that file. The filename should be in partial URL syntax, e.g. "picts/somefile.gif" to refer to the file somefile.gif in the picts directory of WebServerRoot. If necessary, it may copy the file (e.g. in CGI mode) to the temporary directory. All references to files in WebServerRoot should be done through this function, to ensure correct functioning in CGI mode. save_changes() This causes any changes made to the configuration hierarchy since the last save to be saved to the disk version of the hierarchy. save_node(node N) This saves node N to disk. For instance, if a N is "somenode.somechild.somevar", this will be saved to a file called somevar. cfg, in the somechild folder of the somenode folder of the LogAnalysisInfo folder. The format used is normal configuration format. This makes the node persistent, because any future access to that node (e.g. referring to somenode.somechild. somevar in an expression), even from a later process after the current one has exited, will automatically load the node value from disk. So by using save_node(), you can ensure that any changes made to that node will be a permanent part of the configuration hierarchy, and any values you have set will be available for use to all future processes and threads. logout() This logs the current user out (clearing the cookies that maintained their login), and takes them back to the login page (or the logout URL, if one is specified).
expand(string M) This expands variables and expressions in M, and returns the expanded value. For instance, expand("$x") will return the value of the variable x, and expand("$x > $y") will return "12 > 10", if x is 12 and y is 10. convert_escapes(string M) This converts percent-sign escape sequences in M (e.g. converting %2520 to a space), and returns the converted value. For instance, expand("some%2520string") will return "some string". set_active_profile(string profile_name) This sets the active profile to profile_name. The specified profile will be searched when looking up variables; e.g. the variable database.fields.date_time.label (which would normally be invalid, if no profile were active) will be treated as a local nodename within the specified profile, and be resolved as though it were profiles.profile_name.database.fields.date_time.label. debug_message(string M) This converts variables and expressions in M, and displays the resulting value to the standard debug stream (i.e. to the console). This is useful for debugging Salang code. echo(string M) This outputs the string M (without any conversion) directly to the standard output stream. This is useful for debugging Salang code. error(string M) This throws an error exception, with error message M. The error will be reported through the standard Sawmill error reporting mechanism. node_as_string(node N) This converts the node N as a string value (like you would see it in a profile file). The value of this expression is the string value of N. node string_to_node(string s) This converts the string s to a node, and returns the node. This is similar to writing the string to a .cfg file, and reading it in as a node. For instance, string_to_node("x = { a = 1 b = 2 }") will return a node whose name is x, which has two subnodes a and b, with values 1 and 2. autodetect_log_format(node log_source, string result, string id) This autodetects the log format from the log source log_source, and puts the result in the node specified by result. id is an identifier which appears in the task info node for this process, allowing another process to tap into the progress for this process. This is used internally by the profile creation wizard. get_directory_contents(string P, string R) This gets the contents of the directory at pathname P, and puts the result in the node specified by R. R will contain a subnode for each file or subdirectory, and within each subnode there will be a name node listing the name of the file or directory and an is_directory node which is true for a directory or false for a file; there will also be a size node if it's a file, listing the file size in bytes. If is_directory is true, there will be an is_empty value indicating if the directory is empty, and a has_subdirectories option indicating if the directory has subdirectories. get_file_info(string P, string R) This gets information about the file or directory at pathname P, and puts the result in the node specified by R. R.exists will be true or false depending on whether P exists; R.parent will be the full pathname of the parent of P; R.type will be "file" or "directory" depending on whether P is a file or directory; R.filename will be the filename or directory name of P. R. modification_time will be the EPOC time of the last modification of P. R.size will be the size of P in bytes. file_exists(string P) The value of this is true or false, depending on whether the file or directory at pathname P exists. string create_file_lock(string pathname) This creates a file lock on the file whose pathname is pathname. This does not actually acquire the lock; to acquire the lock, use acquire_file_lock().
string acquire_file_lock(string file_lock) This acquires a lock on the file specified by file_lock, which is a file locking object returned by create_file_lock(). This function will wait until the file is not locked, then it will atomically acquire the lock, and then it will return. string release_file_lock(string file_lock) This releases a lock on the file specified by file_lock, which is a file locking object returned by create_file_lock(). The lock must have been previously acquired with acquire_file_lock(). string check_file_lock(string pathname) This checks if the file whose pathname is pathname is locked. If the file is locked, this returns true; otherwise, it returns false. This returns immediately; it does not block or wait for the lock to be released. To block until the lock is released, and then acquire it, use acquire_file_lock(). delete_file(string pathname) This deletes the file whose location is specified by pathname. move_file(string source_pathname, string destination_pathname) This moves the file or folder whose location is specified by source_pathname to the location specified by destination_pathname. delete_directory(string pathname) This deletes the directory whose location is specified by pathname, including all subdirectories and files. get_files_matching_log_source(node L, string R) This gets a list of files matching the log source in node L, and puts the result in the node specified by R. R will contain a subnode for each matching file; the value of the subnode will be the filename. This is used by the interface to implement Show Matching Files. length(string S) The value of this expression is the length of the string S. substr(string V, int S, int L) The value of this expression is the substring of the string V, starting at index S and of length L. The L parameter is optional, and if it is omitted, the value of the expression is the substring of V starting at S and continuing to the end of V. split(string s, string divider, string resultnode) This splits the string s on the divider specified in divider, and puts the resulting sections into the node specified by resultnode. For instance, split("Hello,you,there", ",", "volatile.splitresult") will set volatile.splitresult.0 to "Hello", volatile.splitresult.1 to "you", and volatile.splitresult.2 to "there". starts_with(string S, string T) The value of this expression is true if the string S starts with the value of the string T. ends_with(string S, string T) The value of this expression is true if the string S ends with the value of the string T. contains(string S, string T) The value of this expression is true if the string S contains the value of the string T. replace_all(string S, string T, string R) The value of this expression is the value of S after all occurrences of T have been replaced with R. replace_first(string S, string T, string R) The value of this expression is the value of S after the first occurrence of T has been replaced with R. If T does not occur in S, the value of this expression is S. replace_last(string S, string T, string R) The value of this expression is the value of S after the last occurrence of T has been replaced with R. If T does not occur in S, the value of this expression is S.
lowercase(string S) The value of this expression is the value of S after all uppercase letters have been converted to lowercase. uppercase(string S) The value of this expression is the value of S after all lowercase letters have been converted to uppercase. convert_base(string value, int frombase, int tobase) This converts the string value from the base frombase to the base tobase, i.e., it treats it as a integer in base frombase, and returns a string which represents an integer in base tobase. It returns the converted version of value. Currently, this only supports base 16 (hexadecimal) to base 10 conversion, and base 10 to base 16 conversion. convert_charset(string value, string fromcharset, string tocharset) This converts the string value from the charset fromcharset to the charset tocharset. It returns the converted version of value. Charset names are as documented for the GNU iconv conversion utility. matches_regular_expression(string S, string R) The value of this expression is true if the string S matches the regular expression R. If it matches, the variables $0, $1, $2, ... are set to the substrings of S which match the parenthesized subexpressions RE. matches_wildcard_expression(string str, string wildcardexp) The value of this expression is true if the string str matches the wildcard expression wildcardexp. index(string S, string T) The value of this expression is the index (character position) of the substring T in the string S. If T is not a substring of S, the value of this expression is -1. last_index(string S, string T) The value of this expression is the index (character position) of the final occurrence of substring T in the string S. If T is not a substring of S, the value of this expression is -1. md5_digest(string S) The value of this expression is the MD5 digest of the string S, as a 32-digit hexadecimal number. get_license_info(node licensenode, node resultnode) This looks at the Sawmill license key contained in licensenode, and extracts information about it, which it puts in the node pointed to by resultnode. The subnodes of the result are type, valid, valid64, valid65, addon, expiration, profiles, and users. authenticate(string username, string password, bool verify_only) This attempts to authenticated a user username with password password. It verify_only is true, it only verifies the validity of the username and password, returning true if they are valid or false otherwise. If verify-only is false, it verifies the validity of the username and password, and if they are valid, logs them in. string create_trial_license() This creates and returns a 30-day trial license key. Only one trial key can be generated per installation; after the first time, it will return an empty string. display_statistics_filters(node F, string D) The value of this expression is HTML which describes the Filters in node F, for database field D. This is used internally by the statistics interface to display Filters. query_one_db_item(node F, string R) The queries the numerical database field totals for the Filters F, and puts the result in the node R. query_db_for_view(node V, string R) The queries the database for statistics view V, and puts the result in R. The format of R depends on the type of the view V. Also, if volatile.csv_export is true, this computes a CSV version of the query result, and puts it in volatile.csv_export_result. query_db_calendar(string R)
This queries the database to compute calendar information (which days are in the database), and puts the result in R. R contains a numerical year node for each year, and within that a numerical month node for each month, and within that a numerical day node for each day in the database. create_table(string name, node header) This creates a new table object. The name of the new table is name, and the columns are described in header. Column subnode values include:
q q
database_field_name: the name of the database field for this column, if any. table_field_type: the type of the field, which is one of the following:
r
r r r r r r r r
custom_itemnum: a field capable of holding arbitrary string values, with automatic normalization internaly to integers. database_itemnum: a field containing the integer item number for a database field int8: an 8-bit integer field. int16: a 16-bit integer field. int32: a 32-bit integer field. int64: a 64-bit integer field. int: an integer field using the maximum efficient number of bits for the platform. float: a floating point field. set: a field containing a set of integers.
The table created is a SSQL table, available for querying with SSQL. load_table(string name) This gets a table variable from a table which exists on disk (previously created with create_table() or query_db_for_view()). This returns the table variable. unload_table(string table) This frees memory usage, and closes files, used by the table object table, which was loaded by an earlier call to load_table(). node get_table_header(table t) This returns the header node for the table t. string get_table_name(table t) This returns the name of the table t. table_get_num_rows(table t) This returns the number of rows in the table t. table_get_cell_value(table t, int rownum, int colnum) This returns the value in the cell at row number rownum and column number colnum in the table t. table_get_cell_string_value(table t, int rownum, int colnum) This returns the value in the cell at row number rownum and column number colnum in the table t, as a string. If it is an itemnum cell, it resolves the itemnum, returning its string value. table_set_cell_value(table t, int rownum, int colnum, (int|float) value) This sets the value in the cell at row number rownum and column number colnum in the table t, to the value value. For best performance, the type of value should match the type of the table column. table_set_cell_string_value(table t, int rownum, int colnum, string value) This sets the value in the cell at row number rownum and column number colnum in the table t, as a string. The cell must be an itemnum cell. It converts value to an itemnum, and puts that value in the table. ssql_query(string query) This performs a SSQL query. SSQL is a limited subset of SQL; see SSQL: Sawmill Structured Query Language (SQL). This returns the table result of the query, or the empty string ("") if the query does not return a result.
get_db_field_hierarchy(string dbfieldname, string nodename) The get the hierarchy of the database field dbfieldname, and puts the result in nodename. The values and values of nodes in nodename match the itemnums of items for that database field, and the hierarchical structure of nodename exactly matches the structure of the field. db_item_has_subitems(string D, string I) The checks whether item I of database field D has subitems in the current database. It returns true if it does, and false if it isn't (i.e. if it's a bottom-level item). database_sql_query(string query, bool temporary, bool update_tables) This performs a SQL query of the current profile's database. This differs from ssql_query() in that ssql_query() always queries the internal SSQL database (which always exists, mostly for internal bookkeeping tables, even if a profile uses an external database), but this function queries the profiles main database. For instance, if you're using an Oracle database with the profile, then, this will query the Oracle database for the current profile, not the internal SSQL database. The result will be returned as a SSQL table. The table will be a temporary table (deleted when the process completes) if temporary is true. If update_tables is true, the query will also be scanned for table names, and all xref tables, hierarchy tables, and session tables found will be updated with the latest data in the main table, before the query is run. ssql_query_upload_to_database(string query, string table_name) This performs a SSQL query on the SSQL database (which is not the profile's database, unless the profile uses the internal SQL database). It uploads the result of the query to the profile's internal database, putting it in the table specified by table_name. This basically does the opposite of database_sql_query(); where database_sql_query() queries the profile's SQL database and writes it to an internal database table, this queries the internal SSQL database, and writes it to a table in the profile's SQL database. The two can be used together to download a table from the SQL database, edit it, and upload it. database_item_to_itemnum(int dbfieldnum, string item) This returns the internal item number of item item of database field dbfieldnum. If there is no item by that name, it returns 0. database_itemnum_to_item(int dbfieldnum, int itemnum) This returns the item value associated with the interal item number itemnum of database field dbfieldnum. set_log_field(string N, string V) The sets the value of the log field N of the current log entry to V. include(node N) This loads and processes the Salang code in the node specified by nodename N. discard(anytype V) The value of this expression is always empty (""). This is useful if you want to evaluate an expression V, but not use its value. capitalize(string V) This capitalizes the value V, using the capitalization rules in the language module. pluralizes(string V) This pluralizes the value V, using the pluralization rules in the language module. char_to_ascii(string c) This returns the integer ASCII code of the character c. c should be a one-character string. ascii_to_char(int i) This returns a one-character string containing the ASCII character whose code is i. ip_address_to_int(string ipaddr) This converts the IP address ipaddr to an integer (a 32-bit representation with 8 bits per octet). convert_field_map(string F, string M) This converts the value of the log field F in the current log line we're processing, using the map M. M is a |-divided list of A->B mappings; e.g. if M is "1->cat|2->dog|3->ferret", then a field value "2" will be converted to "dog".
collect_fields_using_regexp(string R, string F) This matches the current log line we're processing against the regular expression R, and if it matches, it extracts the parenthesized values in R and puts them in the fields specified by F. F is a comma-separated list of fields, and should include *KEY*, which specifies the key of the collected log entry to modify. E.g. if F is "page,*KEY*,date_time,host", the second parenthesized subexpression will be used as the key, and that key will specified which collected entry to modify; then the page, date_time, and host fields of that collected entry will be set to the first, third, and fourth parenthesized sections. collect_listed_fields_using_regexp(string regexp, string divider, string separator, string field_names_map) This matches the current log line we're processing against the regular expression regexp, and if it matches, it uses the first parenthesized section in regexp and uses that as the key, and uses the second parenthesized section and uses it as the name/values list. Them it uses that, and its other parameters, to do the same as collected_listed_fields(). collect_listed_fields(string key, string name_values_list, string divider, string separator, string field_names_map) This extracts log field values from name_values_list, which is a list of name/values pairs, and puts them in the collected entry specified by key. Names and values are listed in name_value_list the format "name1separatorvalue1dividername2separatorvalue2dividername3separatorvalue3", e.g. pair are separate from each other by the value of divider, and each name is separate from its value by the value of separator. In addition, field_names_map can be used to convert field names; field_names_map is a pipe-separated (|-separated)list of values like "fieldname=newfieldname", and if any extracted field matches a fieldname value from field_names_map, it will be put in the newfieldname log field. If there is no map, or if nothing matches, then values will be put in the log field specified by the field name. accept_collected_entry_using_regexp(string R) This matches the current log line we're processing against the regular expression R, and if it matches, it extracts the first parenthesized value in R; using that value as a key field, it accepts the log entry corresponding to that key (as created by collect_fields_using_regexp()) into the database. accept_collected_entry(string key, bool carryover) This accepts accepts the log entry corresponding to the key key (as created by set_collected_field(), or collect_listed_fields(), or collect_fields_using_regexp(), or collect_listed_fields_using_regexp()) into the database. If the carryover parameter is true, the keyed entry will remain in memory, and can be collected to, and accepted again; if it is false, the keyed entry will be removed from memory. set_collected_field(string key, string log_field_name, string set_value) This sets the collected field log_field_name, in the collected log entry specified by key, to the value set_value. get_collected_field(string key, string log_field_name) This returns the value of the collected field log_field_name, in the collected log entry specified by key. rekey_collected_entry(string F, string T) This changes the key of the collected entry with key F so its key is T instead. generate_report_id(string P, string A) This creates a new report ID, which can be used to generate a report with generate_report(). P is the profile name, and A is a list of any configuration options that need to be changed from the defaults in the report. The value of this function is the generated report ID. generate_report(string P, string I) This begins generates the report with id I, for profile P. Generation occurs in a separate task, so this returns immediately, while report generation continues in the background. Calls to the functions below can tap into the status of the report, and the final generated report. The value of this function is empty. get_progress_info(string taskid, string resultnode) This gets progress information for a running task with task ID taskid. It populates the node specified by resultnode with detailed progress information. If taskid is not a valid task or there is no progress available for that task, resultnode will have a subnode exists with value false. The value of this function is empty. cached_report_exists(string P, string I) This checks if the report from profile P with id I has been completely generated, and is now cached. The value of this function
is true if the report is cached, and falsed if it is not cached (never generated or generation is in progress). display_cached_report(string P, string I) This displays the cached report from profile P with id I. This will fail if the report is not cached-- call cached_report_exists() to check. The value of this function is the complete HTML page of the report. delete_cached_report(string P, string I) This deletes a cached report from profile P with id I. Future calls to cached_report_exists() will be false until this is regenerated with generate_report(). The value of this expression is empty. verify_database_server_connection(node server_info) This verifies that a connection can be made to the database server specified by server_info. server_info contains subnodes type (mysql, odbc_mssql, or odbc_oracle), dsn, hostname, username, password, and database_name, which specify how to connect to the server. This function returns true if the connection succeeds, or flase otherwise. get_database_info(string P, string N, bool get_date_info) This gets various information about the database for profile P, and puts it in node N. If get_date_info is true, this includes date range parameters earliest_date_time and latest_date_time. Including these parameters requires hierarchy tables and xref tables to be up to date, so it can be a very slow operation for a large database (if the tables are not up to date), so use false if the date range information is not required. build_database(string P, string R) This builds the database for profile P. If the database build is occurring as part of an attempt to view a report, the report ID should go in R; otherwise, R should be empty (""). update_database(string P, string R) This updates the database for profile P. If the database build is occurring as part of an attempt to view a report, the report ID should go in R; otherwise, R should be empty (""). exec(string executable, node options, bool wait) This runs the command-line program specified by executable, from the command line. If the executable option is an empty string, the main program executable is run (e.g. Sawmill runs itself from the command line). The options node contains the command line options, e.g. for each subnode of the options node, the value of that option (not the name, which is ignored) is a command line option. The wait option specified whether to wait for completion; if it is true, exec() will not return until the command is done; if it is false, exec() will return immediately, leaving the command running in the background. If the wait parameter is false, the return value is the PID of the process; if the wait parameter is true, the return value is the return value of the spawned process. get_task_info(string N) This gets various information about currently active tasks, and puts it in node N. current_task_id(string current_task_id) This returns the Task ID (PID) of the current task. cancel_task(string task_id) This cancels an active task with ID task_id. When this returns, the task has been cancelled. run_task(node task_node) This runs a task at the node specified by task_node. Typically, task_node would be a subnode of the "schedule" node, and this would runs that scheduled task immediately. This return a node with information about the return values of the actions in the task; each subnode corresponds to a task, and contains a retval subnode whose value is the return code of the tasks. sleep_milliseconds(int M) This delays for M milliseconds before evaluating the next expression. save_session_changes() This saves any changes to the configuration hierarchy to the sessions file, so they will carry over to future pages. int connect(string hostname, int port)
Connect via TCP to the specified hostname and port; return the socket. disconnect(int socket) Disconnect from the specified socket. read_from_socket(int socket, data d, int length) Read length bytes from socket, and put them in d. write_to_socket(int socket, data d, int length) Write length bytes to socket, from the start of d. open_file(pathname, mode) This opens the file at pathname pathname, using the "C" style mode specifier "mode". Possible modes include "r" to open a file for reading at the beginning, "r+" to open a file for reading and writing at the beginning, "w" to create the file (deleting if it exists) for writing, "w+" to create the file (deleting if it exists) for reading and writing, "a" to open the file (creating if it does not exist) for writing at the end, "a+" to open the file for reading and writing (creating if it does not exists). This returns a stream handle which can be passed to write_string_to_file(). write_string_to_file(handle, string) This writes a string to an open file. handle is the handle returned by open_file(); string is the string to write to the file. read_line_from_file(handle) This reads a line from an open file. handle is the handle returned by open_file(). This returns the line read, including end-ofline characters. bool end_of_file(handle) This returns true if handle is at the end of the file it refers to, or false if it is not. handle is the handle returned by open_file(). close_file(handle) This closes a file opened with open_file(). handle is the handle returned by open_file(). write_file(P, S) This writes string S to a file at pathname P. If the file does not exist, it is created; if it exists, it is overwritten. P is a local pathname in LogAnalysisInfo, using / as pathname dividers; for instance, to write to a file test.html in WebServerRoot in LogAnalysisInfo, use "WebServerRoot/test.html". read_file(P) This reads the contents of the file at pathname P. It returns the contents of the file as a string. P is either a full pathname, or a pathname local to the Sawmill executable location, using / as pathname dividers; for instance, if your LogAnalysisInfo folder is in the same folder as the Sawmill binary (which is typical), then to read from a file test.html in WebServerRoot in LogAnalysisInfo, use "LogAnalysisInfo/WebServerRoot/test.html". send_email(string sender, string recipient, string message, string smtp_server) This sends an email to sender from recipient, using SMTP server smtp_server. The "message" variable is the entire contents of the message, including mail headers. In the event of an error, the error message will be in volatile. send_email_error_message after returning. now() This returns the current time, as the number of seconds since January 1, 1970, Coordinated Universal Time, without including leap seconds. now_us() This returns the current time, as the number of microseconds since system startup. localtime() This returns the current time, as the number of seconds since January 1, 1970, in the GMT time zone. normalize_date(string date, string format) This computes the "normal" format (dd/mmm/yyyy) of the date in date, and returns the date in normal format. date is in the
format specified by format, which may be any of the date formats in the list at Date format. normalize_time(string time, string format) This computes the "normal" format (hh:mm:ss) of the time in time, and returns the time in normal format. time is in the format specified by format, which may be any of the time formats in the list at Time format. date_time_to_epoc(string D) This converts the string D, which is a date/time of the format 'dd/mmm/yyyy hh:mm:ss' (GMT) to an epoc time (seconds since January 1, 1970). It returns the epoc time. epoc_to_date_time(int E) This converts the integer E, which is a date/time value in EPOC format (seconds since January 1, 1970) to a string of the format 'dd/mmm/yyyy hh:mm:ss'. It returns string date/time. date_time_duration(string D) This computes the duration in minutes of the date_time unit value D, which is of the format 'dd/mmm/yyyy hh:mm:ss', where any value may be replaced by underbars to indicate the unit size, e.g. '01/Feb/2005 12:34:56' indicates a single second (so the value returned will be 1); '01/Feb/2005 12:34:__' indicates a single minute (so the value returned will be 60); '01/Feb/2005 12:__:__' indicates a single hour (so the value returned will be 3600); '01/Feb/2005 __:__:__' indicates a single day (so the value returned will be 86400); '__/Feb/2005 __:__:__' indicates a single month (so the value returned will be the duration of the month, in seconds); and '__/___/2005 __:__:__' indicates a single year (so the value returned will be the duration of the year, in seconds). Value of this format are used frequently in report filters; this function is useful for computing durations of filtered data. clear_session_changes() This clears any changes to the configuration hierarchy saved using save_session_changes(). I.e. this reverts to the last saved version of the configuration hierarchy saved using save_changes(). start_progress_meter_step(string O) This starts a progress meter step with operation name O. finish_progress_meter_step(string O) This finishes a progress meter step with operation name O. set_progress_meter_maximum(float M) This sets the maximum progress meter value to M. set_progress_meter_position(float M) This sets the current position of the progress meter to P (out of a total value specified by set_progress_meter_maximum()). This also causes the progress meter to be updated if necessary; call this function regularly during long operations to ensure that progress occurs properly. set_progress_meter_description(string description) This sets the sub-operation description of the current progress step to the string description. The description is displayed as part of the progress display in the web interface, and on the command line. set_progress_info(node info) This sets all information about the progress display for the current task. It is used for progress prediction, i.e., to let the progress system know what steps to expect during the current action. Here is an example node: info = { label = "Test Label" number_of_steps = "2" current_step = "0" step = { 0 = { operation = "0" label = "Step 0 Label" value = "0"
max_value = "5" } # 0 1 = { operation = "1" label = "Step 1 Label" value = "0" max_value = "5" } # 1 } # step } # info After calling this to define future progress steps, you would then typically run the steps specified, calling start_progress_meter_step() and finish_progress_meter_step() at the beginning and end of each step, and calling set_progress_meter_position() periodically through the step as the value ranges from 0 to the maximum. You might also call set_progress_meter_description() to give a more detailed description of each step. current_log_pathname() Returns the pathname of the log file currently being processed (for log filters). current_log_line() Returns the entire current line of log data being processed (for log filters). sin(float X) This returns the sine of X. cos(float X) This returns the cosine of X. tan(float X) This returns the tangent of X. asin(float X) This returns the arc sine of X. acos(float X) This returns the arc cosine of X. atan(float X) This returns the arc tangent of X. atan2(float X, float Y) This returns the arc tangent of X/Y, using the signs of X and Y to compute the quadrant of the result. compile(string expression) This compiles the Salang expression expression, and returns a node suitable for passing to evaluate(), to evaluate it. evaluate(node compiled_expression) This evaluates an expression compiled by compile_expression(), and returns its value. get_parsed_expression(node compiled_expression) This gets the "parsed expression" node from a compiled expression returned by compile(). The parsed expression is a syntax tree of the expression, which can be useful for doing expression verification (for instance, checking for the presence of required fields).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Databases
Sawmill uses a database on the disk to store information about log data. The database contains a compact version of the log data in the "main table", and a series of secondary tables which provide hierarchy information and improve performance of some queries. Every time a new log entry is read, the information contained in that entry is added to the database. Every time a statistics page is generated, the information needed is read from the database. Reports can query data from the database based on multiple filters. For instance, it is possible in a virus log to filter to show only the source IPs for a particular virus, and for a web log it's possible to see the pages hit by particular visitor. In general any combination of filters can be used; if it possible to create complex and/or/not expressions to zoom in on any part of the dataset. For large datasets, it can be slow to query data directly from the main table. Query performance for some types of tables can be improved using cross-reference tables, which "roll up" data for certain fields into smaller, fast-access tables. For instance, for a web log, you can create a cross-reference table containing page, hit, and page view information; the table will precompute the number of hits and page views for each page, so the standard Pages report can be generated very quickly. See Cross-Referencing and Simultaneous Filters for more information. The Database folder option specifies the location of the database on disk; if the option is blank, Sawmill stores the database in the Databases folder, in the LogAnalysisInfo folder, using the name of the profile as the name of the database folder. New log data can be added to the database at any time. This allows a database to be quickly and incrementally updated, for instance, every day with that day's new log entries. This can be done from the web browser interface by using the Update Database option in The Config Page. A command line (see The Command Line) which would accomplish the same thing is sawmill -p config-file -a ud If your log files are very large, or if your database is extensively cross-referenced, building a database can take a long time, and use a lot of memory and disk space. See Memory, Disk, and Time Usage for information on limiting your memory and disk usage, and increasing the database build speed. A number of advanced options exist to fine-tune database performance. To get the most out of the database feature, you may want to adjust the values of the database parameters.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Database Detail
When you first create a profile, Sawmill will ask you what kind of information you want to track in your profile. The values you choose determine what your initial profile settings are, including the database cross-references and the available views. The options available depend in your log format, but may include some of the following:
q
Track all fields day-by-day. Turning this option cross-references the date/time field to all other fields. This improves performance when the Calendar or Date Range controls are used. Day-by-day information will still be available in the database if this option is not checked, but full table scans will be required to query it, which may make the queries much slower. Track hosts individually. Turning this option on structures the "host," "browsing host," or "source IP" field so that all IPs are tracked fully. If you leave this off, Sawmill will track only the top level domains (e.g. yahoo.com) and subnets (e. g. 127.128) of the IP's, and you will not be able to get information about the activities of a particular IP. If you turn this on, every IP address in the log data will be tracked separately, so information will be available about individual IP addresses and hostnames. Turning this on can significantly increase the size and memory usage of this database.
In addition, there will be a checkbox for each available numerical field in the log data. Checking one of these boxes will add another field to the database, providing information about that numerical field, and will add that numerical field to every report. This will slightly increase the size of the database for most fields, but tracking a "unique" field like visitors may be much more expensive. Turning on unique host (visitor) tracking will result in the visitor id information being tracked for all database items, which will significantly slow log processing and increase database size, but it is necessary if you need visitor information. For web and web proxy logs, you can greatly increase processing speed (as much as four times) by checking only the "page views" box (and not track hits or bandwidth).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Query Types
The following query types are supported:
q q q q q q q q
SELECT DELETE FROM CREATE TABLE INSERT DROP TABLE SHOW TABLES CREATE INDEX USE
SELECT
Syntax: SELECT column1, columnN FROM tablename [ (LEFT|RIGHT|INNER) JOIN jointablename ON tablecolumn = jointablename [ AND condition ] ] [ WHERE condition ] [ GROUP BY column1, ..., columnN ] [ ORDER BY column, ..., columnN ] [ LIMIT startrow, endrow ] This selects data from one or more tables. The columnN values in the SELECT clause may be column names, or may use the aggregation operators SUM, MAX, MIN, AVG, COUNT (including COUNT(DISTINCT F)), or SET. The SET operator is a SSQL extension which creates a "set" column, where each cell contains a set of integer values. Sets can be of arbitrary size; duplicate values are removed; and are merged if a SET column is grouped using a SET aggregation operator. The integer value of a SET column is the number of items in the set. COUNT(DISTINCT) is implemented using SET. There may be any number of JOIN clauses. WHERE conditions, and JOIN conditions, support the following SQL operatoer: AND, OR, NOT, =, and !=.
CREATE TABLE
Syntax:
CREATE TABLE tablename (fieldname1 fieldtype1, ... fieldnameN fieldtypeN) [ SELECT ... ] This creates a table, optionally from the result of a query. If a SELECT clause is present, it can use any syntax of a normal SELECT clause; the result of the SELECT is inserted into the new table.
INSERT
Syntax: INSERT INTO tablename [ SELECT ... ] This inserts the results of a query into tablename. The SELECT portion can use any syntax of a normal SELECT clause; the result of the SELECT is inserted at the end of tablename.
DELETE FROM
Syntax: DELETE FROM tablename WHERE condition This delete rows from tablename which match condition. See SELECT for more information about WHERE conditions.
DROP TABLE
Syntax: DROP TABLE tablename This drops the table tablename from the database.
SHOW TABLES
Syntax: SHOW TABLES LIKE expression This shows all tables in the database whose names match the wildcard expression expression.
CREATE INDEX
Syntax: CREATE [ UNIQUE ] INDEX indexname ON tablename (fieldname) This creates an index on the field fieldname in the table tablename. Indices can make queries and joins faster, if the conditions of the queries or the columns of the joins match the indices available in the table.
USE
Syntax: USE DATABASE databasename This has no effect; it is allowed for compatibility reasons only.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Click the "Add..." button. Choose your ODBC driver, e.g., choose "SQL Server" to connect to a Microsoft SQL Server database.
Click Finish. Choose a name for the DSN in the Name field, e.g. "sawmill" Choose a description for the DNS in the Description field, e.g., "My DSN for connecting to MS SQL with Sawmill" Select the SQL server from the Server menu, or enter the hostname. Click Next:
Choose your authentication type. If you're not sure which type to use, talk to your database administrator. If the server allows access by Login ID and password, you can check the "With SQL Server authentication using login ID and password entered by the user" box, and then enter your SQL Server username and password in the fields below. If no special username has been set up, you may be able to use the standard "sa" username, with whatever password has been configured in the server:
Click through the remaining pages, leaving all options at default values. Once this is done, you can enter "sawmill" as the DSN name in the Create Profile Wizard of Sawmill, and it should connect to your SQL Server:
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
1. The log data is processed, creating the main table. 2. Then the cross-reference tables are built from the main table. 3. Then the main table indices are created.
One way to speed up all of these processes is to use multiple processors. Depending on the version of Sawmill you're using, it may have the ability to split database builds across multiple processes, building a separate database with each processor from part of the dataset, and then merging the results. This can provide a significant speedup -- 65% speedup using two processors is fairly typical. Increasing the speed of your processor is also a very good way to increase the speed of database building -- database builds are primarily CPU-bound, so disk speed, memory speed, and other factors are not as important as the speed of the processor. If you've configured Sawmill to look up your IP numbers using Look up IP numbers using domain nameserver (DNS), the database building process will be slower than usual, as Sawmill looks up all the IP numbers in your log file. You can speed things up by not using look up IP numbers using DNS, by decreasing the DNS timeout, and/or by improving Sawmill's bandwidth to the DNS server. You can also speed up all three stages by simplifying the database structure, using log filters, or by eliminating database fields. For instance, if you add a log filter which converts all IP addresses to just the first two octets, it will be a much simpler field than if you use full IPs. Cross-reference tables can be eliminated entirely to improve database build performance. By eliminating cross-reference tables, you will slow query performance for those queries which would have used a cross-reference table. See CrossReferencing and Simultaneous Filters for more details.
If a 64-bit OS isn't an option, you will need to simplify your database fields using log filters. For example, a filter which chops off the last octet of the IP will greatly reduce the number of unique IPs, probably dropping a huge 1GB item list under 100MB. Also, you may want to simply eliminate the troublesome field, if there is no need for it -- for instance, the uri-query field in web logs is sometime not needed, but tends to be very large. To determine which field is the problem, build the database until it runs out of memory, and then look at the database directory (typically in LogAnalysisInfo/Databases) to see which files are large. Pay particular attention to the 'items' folder -- if files in the xyz folder are particularly huge, then the xyz field is a problem. Finally, if you need to use less disk space or memory due to a quota on your web server, you may be able to get around this problem by running Sawmill on a local machine, where you dictate disk space constraints, and setting it to fetch the log data by FTP.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Log Files
Sawmill is a log file analysis tool. It reads one or more log files, and generates a graphical statistical report of some aspect of the contents of the log data. Sawmill can handle a wide range of log formats. Hundreds of formats are supported by default (see Supported Log Formats), and others can be supported by creating a new log format description file (see Creating Log Format Plug-ins (Custom Log Formats)). If the format of the log data you wish to analyze is not supported, and the log format is the standard format of a publicly-available device or software, please send mail to support@flowerfire.com and we will create the log format description file for you. If it's a custom format, we can create the plug-in for you, for a fee.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Apache/NCSA Combined Log Format Apache/NCSA Combined Format (NetTracker) Apache/NCSA Combined Format With Server Domain After Agent Apache/NCSA Combined Format With Server Domain After Date Apache/NCSA Combined Format With Server Domain After Host Apache/NCSA Combined Format With Server Domain After Size (e.g. 1&1, Puretec) Apache/NCSA Combined Format With Server Domain Before Host Apache/NCSA Combined Log Format with Syslog Apache/NCSA Combined Format With Cookie Last Apache/NCSA Combined Format With Visitor Cookie Apache/NCSA Combined Format With WebTrends Cookie Apache Custom Log Format Apache Error Log Format Apache Error Log Format (syslog required) Apache SSL Request Log Format BeatBox Hits Log Format (default) BEA WebLogic BEA WebLogic Log Format Blue Coat W3C Log Format (ELFF) ColdFusion Web Server Log Format Common Access Log Format Common Access Log Format (Claranet) Common Access Log Format (WebSTAR) Common Access Log Format, with full URLs Apache/NCSA Common Agent Log Format Common Error Log Format Common Referrer Log Format Domino Access Log Format Domino Agent Log Format Domino Error Log Format Flash Media Server Log Format Flex/JRun Log Format W3C Log Format IBM HTTP Server Log Format IBM Tivoli Access Manager Log Format IIS Log Format IIS Log Format (dd/mm/yy dates) IIS Log Format (dd/mm/yyyy dates) IIS Extended Log Format IIS Log Format (mm/dd/yyyy dates) IIS Extended (W3C) Web Server Log Format IIS Extended (W3C) Web Server Log Format (logged through a syslog server)
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
IIS Log Format (yy/mm/dd dates) IIS (ODBC log source) Log Format Miva Access Log Format Miva Combined Access Log Format msieser HTTP Log Format Netscape Extended Log Format NetPresenz Log Format NetPresenz Log Format (d/m/y dates) NetPresenz Log Format (24-hour times, d/m/y dates) PeopleSoft AppServer Log Format PHP Error Log Format Sambar Server Log Format Sawmill Tagging Server Log Format SecureIIS Log Format SecureIIS Binary Log Format (SUPPORTED ONLY AFTER TEXT EXPORT) SmartFilter (Bess Edition) Log Format Squarespace Log Format Symantec Web Security CSV Log Format Know-how Log Format IBM Tivoli Access Manager WebSEAL Log Format Tomcat Log Format TomcatAlt Trend Micro InterScan Web Security Suite Access Log Format URLScan Log Format URL-Scan (W3C) Log Format Web Logic 8.1 Log Format WebSTAR Log Format WebSTAR W3C Web Server Log Format WebSTAR Proxy Log Format Wowza Media Server Pro Log Format Zeus Log Format (Alternate Dates) Zeus Extended Log Format
Uncategorized
q q
Syslog Server
q q q q q q q q q q q q q q q q q q q q q q
Citrix Firewall Manager Syslog Cron Log Format Datagram Syslog Format GNAT Box Syslogger (v1.3) Syslog Imail Header IPCop Syslog Kiwi CatTools CatOS Port Usage Format Kiwi (dd-mm-yyyy dates) Kiwi Syslog (ISO/Sawmill) Kiwi (mm-dd-yy dates, with type and protocol) Kiwi (mm-dd-yyyy dates) Kiwi (mmm/dd dates, hh:hh:ss.mmm UTC times) Kiwi Syslog (UTC) Kiwi (yyyy/m/d hh:mm, tab separated) Syslog Kiwi YYYYMMDD Comma Syslog Kiwi (yyyy/mm/dd, space-separated) Syslog Minirsyslogd Log Format MM/DD-HH:MM:SS Timestamp Network Syslog Format No Syslog Header (use today's date, or use date/time from message) NTsyslog Log Format Passlogd Syslog Format
q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Passlogd Syslog (Full Messages) PIX Firewall Syslog Server Format Seconds since Jan 1 1970 Timestamp Syslog SL4NT Log Format SL4NT (dd/mm/yyyy) SL4NT (dd.mm.yyyy, commas without spaces) SLNT4 Log Format Snare Log Format Solar Winds Syslog Symantec Mail Security Syslog Format Complete Syslog Messages (report full syslog message in one field) Syslog NG Log Format Syslog NG Messages Log Format Syslog NG Log Format (no timezone) Syslog NG Log Format (date with no year) Syslog NG (tab separated) Log Format Syslog NG Log Format (no date in log data; yyyymmdd date in filename) Syslog (yyyymmdd hhmmss) The Dude Syslog Timestamp (mm dd hh:mm:ss) Unix Auth Log Format Unix Daemon Syslog Messages Log Format Unix Syslog Unix Syslog With Year Wall Watcher Log Format Windows NT Syslog Windows Syslog Format WinSyslog
Proxy Server
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Astaro SMTP Proxy Log Format Blue Coat Log Format Blue Coat Log Format (Alternate) Blue Coat Custom Log Format Blue Coat Squid Log Format Cisco Wide Area Application Services (WAAS) TCP Proxy (v4.1+) Log Format Cisco Wide Area Application Services (WAAS) TCP Proxy (v4.0) Log Format Combined Proxy Log Format Common Proxy Log Format CP Secure Content Security Gateway EZProxy Log Format Microsoft Port Reporter Log Format Microsoft Proxy Log Format Microsoft Proxy Log Format (d/m/yy dates) Microsoft Proxy Log Format (d/m/yyyy dates) Microsoft Proxy Log Format (m/d/yyyy dates) Microsoft Proxy Packet Filtering Log Format Microsoft Proxy Log Format (Bytes Received Field Before Bytes Sent) ProxyPlus Log Format Proxy-Pro GateKeeper Log Format SafeSquid Combined/Extended Log Format Squid Common Log Format Squid Common Log Format - Syslog Required Squid Event Log Squid Log Format With Full Headers Squid Guard Log Format Squid Log Format With ncsa_auth Package Squid Log Format Useful Utilities EZproxy Log Format VICOM Gateway Log Format Vicomsoft Internet Gateway Log Format
q q q q q q q q q q q q
Visonys Airlock Log Format Winproxy Log Format Winproxy Log Format (2-digit years) Winproxy Common Log Format Kerio Winroute Firewall Log Format WinGate Log Format (no Traffic lines, dd/mm/yy dates) WinGate Log Format (no Traffic lines, mm/dd/yy dates) WinGate Log Format (with Traffic lines) Winproxy 5.1 Log Format (yyyy-mm-dd dates) WinProxy Alternate Log Format WinRoute Web Log Format Youngzsoft CCProxy Log Format
Other
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
3Com NBX 100 Log Format Array 500 Combined Log Format Ascend Log Format Atom Log Format Autodesk Network License Manager (FlexLM) Log Format (Enhanced Reports) Autodesk Network License Manager (FlexLM) Log Format BitBlock Log Format Blue Coat Instant Messenger Log Format Borderware runstats Log Format CFT Account Log Format Click To Meet Log Format Cumulus Digital Asset Management Actions Log Format CWAT Alert Log Format Dade Behring User Account Format (With Duration) Dade Behring User Log Format Datagram SyslogAgent Log Format Digital Insight Magnet Log Format Dorian Event Archiver (Windows Event Log) Format du Disk Usage Tracking Format (find /somedir -type f | xargs du) Eventlog to Syslog Format Event Reporter Logs (version 7) Event Reporter v6 FastHosts Log Format FedEx Tracking Log Format CSV (Generic Comma-Separated Values) Log Format Google Log Format GroupWise Post Office Agent Log Format Groupwise Web Access Log Format (dd/mm/yy) Groupwise Web Access Log Format (mm/dd/yy) GroupWise Internet Agent Accounting Log Format (2-digit years) GroupWise Internet Agent Accounting Log Format (4-digit years) Hosting.com Log Format HP UX Audit Log Format htdig Log Format Novell iChain W3C Log Format InfiNet Log Format INN News Log Format INN News Log Format (Alternate) IOS Debug IP Packet Detailed (Using Syslog Server) ipchains Log Format IPEnforcer IPMon Log Format (Using Syslog Server) IST Log Format Novell iChain Extended (W3C) Web Server Log Format iPlanet Error Log Format Java Administration MBEAN Log Format Lava2 Log Format
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Sawmill Task Log Format Metavante CEB Failed Logins Log Format Microsoft Elogdmp (CSV) Log Format (CSV) Nessus Log Format Netscape Messenger Log Format Netstat Log Format (uses script generated timestamp from log or GMT time) NetKey Log Format nmap Log Format nnBackup Log Format Norstar PRELUDE and CINPHONY ADC Log Format Nortel Meridian 1 Automatic Call Distribution (ACD) Log Format OpenVPN Log Format Optima Log Format Oracle Express Authentication Log Format Order Log Format O'Reilly Log Format Planet-Share InterFax Log Format praudit Log Format PsLogList Log Format RACF Security Log Format RAIDiator Error Log Format Redcreek System Message Viewer Format Servers Alive Log Format Servers Alive (Statistics) Log Format SIMS Log Format Snare for AIX Log Format Sourcefire IDS Symantec Antivirus Log Format Symantec System Console Log Format Sysreset Mirc Log Format tcpdump Log Format (-tt) tcpdump Log Format tcpdump Log Format (-tt, with interface) tcpdump Log Format (-tt, with interface) Alternate Tellique Log Format Trend Micro Control Manager Unicomp Guinevere Log Format Unicomp Guinevere Virus Log Format Unreal Media Server Log Format User Activity Tracking Log Format WAP WebSphere Business Integration Message Brokers User Trace Log Format WebSEAL Audit Log Format WebSEAL Authorization (XML) Log Format WebSEAL Error Log Format WebSEAL Security Manager Log Format WebSEAL Wand Audit Log Format WebSEAL Warning Log Format WebSEAL CDAS Log Format Welcome Log Format Whatsup Syslog WhistleBlower (Sawmill 6.4) Whistle Blower Performance Metrics Log Windows Performance Monitor Windows 2000/XP Event Log Format (export list-CSV) ddmmyyyy Windows 2000/XP Event Log Format (save as-CSV) dd/mm/yyyy Windows Event Log Format (24 hour times, d/m/yyyy dates) Windows Event Log Format (ALTools export) Windows Event (Comma Delimited, m/d/yyyy days, h:mm:ss AM/PM times) Log Format Windows Event (Comma Delimited) Log Format Windows Event (Comma Delimited) dd.mm.yyyy Log Format Windows Event Log (comma or tab delimited, no am/pm, 24h & ddmmyyyy) Log Format Windows Event Log Format (dumpel.exe export)
q q q q q q q q
Windows Event Log Format (dumpevt.exe export) Windows Event .evt Log Format (SUPPORTED ONLY AFTER CSV OR TEXT EXPORT) Windows XP Event Log (Microsoft LogParser CSV Export) Windows Event (Tab Delimited) Log Format Windows NT4 Event Log Format (save as-CSV) Windows NT Scheduler Log Format X-Stop Log Format Yamaha RTX Log Format
Network Device
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
3Com Office Connect / WinSyslog Log Format Annex Term Server Apple File Service Log Format AppleShare IP Log Format Aruba 800 Wireless LAN Switch Log Format Aventail Web Access Log Format Bind 9 Query Log Format Bind 9 Log Format (Syslog required) BIND 9 Query Log Format (with timestamp) Bindview Reporting Log Format Bindview User Logins Log Format Bindview Windows Event Log Format Bind Query Log Format Bind Query Log Format With Timestamp Bind Response Checks Log Format Bind Security Log Format Bind 9 Update Log Format (with timestamp) Bintec VPN 25 or XL Bluesocket Log Format bpft4 Log Format bpft4 Log Format (with interface) bpft traflog Log Format Cisco 827 Log Format (Kiwi, Full Dates, Tabs) CiscoWorks Syslog Server Format Cisco 3750 Log Format Cisco Access Control Server Log Format Cisco Access Register Cisco ACNS log w/ SmartFilter Cisco As5300 Log Format Cisco CE Log Format Cisco CE Common Log Format Cisco EMBLEM Log Format Cisco IDS Netranger Log Format Cisco IPS Log Format Cisco NetFlow Cisco NetFlow (version 1) Cisco NetFlow Binary (DAT) Log Format (SUPPORTED ONLY AFTER ASCII EXPORT) Cisco NetFlow (FlowTools ASCII Export) Cisco NetFlow (flow-export) Cisco NetFlow (no dates) Cisco Router Log Format (Using Syslog Server) Cisco Router Log Format (no syslog) Cisco SCA Log Format Cisco Secure Server (RAS Access) Log Format Cisco SOHO77 Cisco Voice Router Cisco VPN Concentrator CiscoVPNConcentratorAlt Cisco VPN Concentrator (Comma-delimited) Cisco VPN Concentrator (Comma separated - MMDDYYYY) Cisco VPN Concentrator Syslog Log Format
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Citrix NetScaler Log Format Clavister Firewall Binary Log Format (SUPPORTED ONLY AFTER FWLoggqry.exe EXPORT) DLink DI-804HV Ethernet Broadband VPN Router Log Format DNSone DHCP Log Format Wireshark (previously Ethereal) Wireshark/Ethereal/tcpdump Binary Log Format (SUPPORTED ONLY AFTER -r -tt CONVERSION) F5 Load Balancer Firepass Log Format Checkpoint Firewall-1 Binary Log Format [SUPPORTED ONLY AFTER TEXT EXPORT] Foundry Networks Log Format Foundry Networks BigIron Free Radius Log Format honeyd Log Format IBM Tivoli NetView Log Format Intel NetStructure VPN Gateway Log Format Intermapper Event Log Format Intermapper Outages Log Format Intermapper Outages Log Format (dd mmm yyyy dates, 24-hour times) Intermapper Outages Log Format (mmm dd yyyy dates, AM/PM times) Internet Security Systems Network Sensors Intersafe HTTP Content Filter Log Format Interscan E-mail Log Format Interscan E-mail Viruswall Log Format Interscan Proxy Log Format (dd/mm/yyyy dates) Interscan Proxy Log Format (mm/dd/yyyy dates) Interscan Viruswall Virus Log Format InterScan Viruswall Log Format iptables Log Format IPTraf Log Format IP Traffic LAN Statistics Log IPTraf TCP/UDP Services Log Format ISC DHCP Log Format ISC DHCP Leases Log Format Jataayu Carrier WAP Server (CWS) Log Format Kerio Network Monitor Log Format Kerio Network Monitor HTTP Log Format KS-Soft Host Monitor log format Lancom Router LinkSys VPN Router LinkSys Router Log Format Mikrotik Router Log Format MonitorWare MonitorWare (Alternate) Nagios Log Format Neoteris Log Format Netgear FVL328 Log Format (logging to syslog) Netgear FVS318 Netgear Security Log Format Netgear Security Log Format (logging to syslog) Netopia 4553 Log Format Net-Acct NetForensics Syslog Format NetGear Log Format NetGear DG834G Log Format NetGear FR328S Log Format Nortel Contivity (VPN Router and Firewall) Log Format Nortel Networks RouterARN Format (SUPPORTED ONLY AFTER TEXT EXPORT) Piolink Network Loadbalance Log Format Radius Accounting Log Format Radius Accounting Log Format II Radius ACT Log Format Radware Load Balancing (Using Syslog Server) Simple DNS
q q q q q q q q q q q q q q q q q q q q q q
SiteMinder WebAgent Log Format SNMP Manager Log Format Snort Log Format (syslog required) Snort 2 Log Format (syslog required) SNORT Portscan Log Format Snort Log Format (standalone, mm/dd dates) Snort Log Format (standalone, mm/dd/yy dates) Socks 5 Log Format Steel Belted Radius ACT Log Format TACACS+ Accounting Log Format tinyproxy Tipping Point Log Format TrendMicro/eManager Spam Filter Log Format Trend Micro InterScan Messaging Security Suite eManager Log Format Trend Micro ScanMail For Exchange Log Format Trend ServerProtect CSV Admin Log Format Trend Webmanager Log Format Vidius Combined Log Format Watchguard Binary (WGL) Log Format (SUPPORTED ONLY AFTER TEXT EXPORT) 4ipnet WHG Log Format Windows 2003 DNS Log Format ZyXEL Communications Log Format
Media Server
q q q q q q q q q q q q q q q q q q
Blue Coat RealMedia Log Format Blue Coat Windows Media Log Format Helix Universal Server Log Format Helix Universal Server (Style 5) Log Format IceCast Log Format IceCast Alternate Log Format Limelight Flash Media Server Log Format Microsoft Media Server Log Format Quicktime/Darwin Streaming Server Log Format Quicktime Streaming Error Log Format RealProxy Log Format RealServer Log Format RealServer Log Format, Alternate RealServer Error Log Format Shoutcast 1.6 Log Format Shoutcast 1.8+ Log Format SHOUTcast W3C Log Format VBrick EtherneTV Portal Server Log Format
Mail Server
q q q q q q q q q q q q q q q q
Aladdin Esafe Gateway Log Format Aladdin eSafe Mail Log Format Aladdin eSafe Sessions Log Format Aladdin eSafe Sessions Log Format v5/v6 Aladdin eSafe Sessions (with URL category) Log Format Amavis Log Format Anti-Spam SMTP Proxy (ASSP) Log Format Argosoft Mail Server Log Format Argosoft Mail Server Log Format (with dd-mm-yyyy dates) AspEmail (Active Server Pages Component for Email) Log Format Barracuda Spam Firewall - Syslog Centrinity FirstClass Log Format Centrinity FirstClass (m/d/yyyy) Log Format ClamAV Communigate Log Format Communigate Pro Log Format
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Critical Path Mail Server POP/IMAP Log Format Critical Path Mail Server SMTP Log Format Declude SPAM Declude Virus DeepMail IMAP/POP3/SMTP Server Log Format Dovecot Secure IMAP/POP3 Server Log Format EIMS Error Log Format EIMS SMTP (12 hour) Log Format EIMS SMTP (24 hour) Log Format EmailCatcher Microsoft Exchange Internet Mail Log Format Exim Log Format Exim 4 Log Format FirstClass Server Log Format GFI Attachment & Content Log Format GFI Spam Log Format GMS POP Log Format GMS POST Log Format GMS SMTP Log Format GW Guardian Antivirus Log Format GW Guardian Spam Log Format hMailserver Log Format IIS SMTP Common Log Format IIS SMTP W3C Log Format IIS SMTP Comma Separated Log Format IMail Log Format Interscan Messaging Security Suite Integrated Log Format Interscan Messaging Security Suite Log Format Interscan Messaging Security Suite (emanager) Log Format Interscan Messaging Security Suite (virus) Log Format Iplanet Messenger Server 5 Log Format Ironmail AV Log Format (Sophos) Ironmail CSV Log Format Ironmail SMTPO Log Format Ironmail SMTP Proxy Log Format Ironmail Sophosq Log Format Ironmail Spam Log Format IronPort C-Series Log Format IronPort Bounce Log Format iMail Log Format iMail Log Format, Alternate iPlanet Messaging Server 5/6 MTA Log Format Kaspersky Log Format Kaspersky Labs for Mail Servers (linux) Log Format Kerio Mailserver Mail Log Format LISTSERV Log Format LogSat SpamFilterISP Log Format B500.9 Lucent Brick (LSMS) Admin Log Format LSMTP Log Format LSMTP Access Log Format Lyris MailShield Log Format Mailer Daemon Log Format Mailman Post Log Format Mailman Subscribe Log Format mailscanner Log Format Mail Enable W3C Log Format Mail Essentials Log Format MailMax SE Mail POP Log Format MailMax SE SMTP Log Format MailScanner Log Format (testfase) MailScanner Virus Log Format (email messages sent) MailStripper Log Format MailSweeper (AM/PM) Log Format
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
MailSweeper (24 Hour) Log Format MailSweeper (long) Log Format McAfee E1000 Mail Scanner MDaemon 7 Log Format MDaemon 7 (All) Log Format MDaemon 8 (All) Log Format Merak POP/IMAP Log Format Merak SMTP Log Format Microsoft Exchange Server Log Format Microsoft Exchange Server 2000/2003 Log Format Microsoft Exchange Server 2000 Log Format (comma separated) Microsoft Exchange Server 2007 Log Format (comma separated) Mirapoint SMTP Log Format Mirapoint SMTP Log Format (Logged To Syslog) msieser SMTP Log Format MTS Professional Log Format NEMX PowerTools for Exchange Novell NetMail Log Format Novell NetMail 3.5 Log Format Openwave Intermail Log Format Postfix Log Format Post Office Mail Server Log Format PostWorks IMAP Log Format PostWorks POP3 Log Format PostWorks SMTP Log Format qmail-scanner Log Format qmail (Syslog Required) Log Format qmail Log Format (TAI64N dates) RaidenMAILD Log Format Scanmail For Exchange Log Format Sendmail Log Format Sendmail (no syslog) Log Format Sendmail for NT Log Format SmarterMail Log Format SmartMaxPOP Log Format SmartMaxSMTP Log Format Sophos Antispam Message Log Format Sophos Antispam PMX Log Format Sophos Mail Monitor for SMTP spamd (SpamAssassin Daemon) Log Format SpamAssassin Log Format Symantec Gateway Security 2 (CSV) Log Format Symantec Mail Security Log Format TFS MailReport Extended Log Format uw-imap Log Format InterScan VirusWall (urlaccesslog) Web Washer Log Format WinRoute Mail Log Format XMail SMTP Log Format XMail Spam Log Format
Internet Device
q q q q q q q q q
DansGuardian 2.2 Log Format DansGuardian 2.4 Log Format DansGuardian 2.9 Log Format Guardix Log Format (IPFW) ISS Log Format iPrism Monitor Log Format iPrism-rt Log Format iPrism (with syslog) McAfee Web Shield Log Format
q q q q q q q q q q q q q q q
McAfee Web Shield XML Log Format Message Sniffer Log Format N2H2 Log Format N2H2 / Novell Border Manager Log Format N2H2 Sentian Log Format Netegrity SiteMinder Access Log Format Netegrity SiteMinder Event Log Format Netilla Log Format NetApp Filers Audit Log Format NetApp NetCache Log Format NetApp NetCache 5.5+ Log Format Packet Dynamics Log Format Privoxy Log Format Vircom Log Format Websweeper Log Format
FTP Server
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
BDS FTP Log Format Bulletproof/G6 FTP Log Format (dd/mm/yy dates, 24-hour times) Bulletproof/G6 FTP Log Format (dd/mm/yyyy dates) Bulletproof/G6 FTP Log Format (dd/mm/yyyy dates, 24 hour times) Bulletproof/G6 FTP Log Format (mm/dd/yy dates) Bulletproof/G6 FTP Log Format (mm/dd/yyyy dates) Bulletproof/G6 FTP Sessions Log Format Bulletproof/G6 FTP Log Format (yyyy/mm/dd dates) FileZilla Server (d/m/yyyy) Log Format FileZilla Server (m/d/yyyy) Log Format Flash FSP Log Format Gene6 FTP Server Log Format Gene6 FTP W3C Log Format IIS FTP Server Log Format MacOS X FTP Log Format NcFTP Log Format (Alternate) NcFTP Xfer Log Format ProFTP Log Format PureFTP Log Format Raiden FTP Log Format Rumpus Log Format Serv-U FTP Log Format UNIX FTP Log Format War FTP Log Format War FTP Log Format (Alternate) WebSTAR FTP Log Format WS_FTP Log Format WU-FTP Log Format WU-FTP Log Format (yyyy-mm-dd Dates, Server Domain)
Firewall
q q q q q q q q q q q q
3Com 3CRGPOE10075 Log Format 8e6 Content Appliance Log Format AboCom VPN Firewall FW550 Applied Identity WELF Log Format Argus Array SPX Log Format AscenLink Log Format Astaro Security Gateway Log Format Barracuda Spyware Firewall / Web Filter Log Format Barrier Group Log Format BigFire / Babylon accounting Log Format Bomgar Box Log Format
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Borderware Log Format Novell Border Manager Log Format BroadWeb NetKeeper Log Format Cell Technology IPS Log Format Check Point SNMP Log Format Cisco PIX/ASA/Router/Switch Log Format Cisco PIX/IOS Log Format Clavister Firewall Log Format Clavister Firewall Log Format (CSV) Clavister Firewall Syslog Log Format Coradiant Log Format (object tracking) Coradiant TrueSight Log Format (object tracking) v2.0 Cyberguard WELF Log Format Cyberguard Firewall (non-WELF) Audit Log Format Cyberguard WELF Log Format Radware DefensePro Log Format Enterasys Dragon IDS Log Format Firebox Log Format FirePass SSL VPN Log Format Firewall-1 (fw log export) Log Format Firewall-1 (fw logexport export) Log Format Firewall-1 (fw log -ftn export) Log Format Firewall-1 Log Viewer 4.1 Export Log Format Firewall-1 NG (text export) Log Format Firewall-1 Next Generation Full Log Format (text export) Firewall-1 Next Generation General Log Format (text export) Firewall-1 Text Export Log Format Firewall1 Webtrends Log Format Fortinet Log Format (syslog required) FortiGate Log Format FortiGate Comma Separated Log Format FortiGate Space Separated Log Format FortiGate Traffic Log Format Gauntlet Log Format Gauntlet Log Format (yyyy-mm-dd dates) GNAT Box Log Format (Syslog Required) GTA GBWare Log Format IAS Log Format IAS Alternate Log Format IAS Comma-Separated Log Format Ingate Firewall Log Format Instagate Access / Secure Access Log Format Interscan Web Security Suite ipfw Log Format IPTables Config Log Format Microsoft ISA WebProxy Log Format (CSV) Microsoft ISA Server Packet Logs Microsoft ISA 2004 IIS Log Format Juniper/Netscreen Secure Access Log Format Juniper Secure Access SSL VPN Log Format Kernun DNS Proxy Log Format Kernun HTTP Proxy Log Format Kernun Proxy Log Format Kernun SMTP Proxy Log Format Lucent Brick McAfee Secure Messaging Gateway (SMG) VPN Firewall Microsoft ICF Log Format Microsoft ISA Server Log Format (W3C) Microsoft Windows Firewall Log Format iPlanet/Netscape Log Format Netscreen IDP Log Format Neoteris/Netscreen SSL Web Client Export Log Format Netscreen SSL Gateway Log Format
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Netscreen Web Client Export Log Format Netwall Log Format NetContinuum Application Security Gateway Log Format NetScreen Log Format NetScreen Traffic Log Format (get log traffic) Juniper Networks NetScreen Traffic Log Format Nokia IP350/Checkpoint NG (fw log export) Log Format Nortel SSL VPN Log Format Norton Personal Firewall 2003 Connection Log Format Novell Border Manager Log Format OpenBSD Packet Filter (tcpdump -neqttr) Firewall Log Format Palo Alto Networks Firewall Threat Log Format Palo Alto Networks Firewall Traffic Log Format portsentry Log Format Rapid Firewall Log Format Raptor Log Format Raptor Log Format (Exception Reporting) SafeSquid Log Format (logging to syslog server) SafeSquid Log Format (Orange) SafeSquid Standalone Log Format SAS Firewall Separ URL Filter Log Format Symantec Gateway Security 400 Series Log Format Sharetech Firewall Log Format Sharewall Log Format Sidewinder Log Format Sidewinder Firewall Log Format Sidewinder Raw Log Format (SUPPORTED ONLY AFTER acat -x EXPORT) Sidewinder Syslog Log Format SmoothWall Log Format SmoothWall SmoothGuardian 3.1 Log Format SonicWall or 3COM Firewall SonicWall 5 Sonicwall TZ 170 Firewall Sophos Web Appliance Stonegate Log Format Symantec Enterprise Firewall Log Format Symantec Enterprise Firewall 8 Log Format Symantec Security Gateways Log Format (SGS 2.0/3.0 & SEF 8.0) Symantec Gateway Security Binary Log Format (SUPPORTED ONLY WITH TEXT EXPORT) Symantec Gateway Security Log Format (via syslog) Symantec Web Security Log Format Tiny Personal Firewall Log Format Tipping Point IPS Log Format Tipping Point SMS Log Format UUDynamics SSL VPN Watchguard Log Format Watchguard Firebox Export Log Format (y/m/d format) Watchguard Firebox Export Header Watchguard Firebox Export Log Format (m/d/y format) Watchguard Firebox v60 Log Format Watchguard Firebox V60 Log Format Watchguard Firebox X Core e-Series Log Format Watchguard Historical Reports Export Log Format Watchguard SOHO Log Format Watchguard WELF Log Format Watchguard WSEP Text Exports Log Format (Firebox II & III & X) Watchguard XML Log Format Webtrends Extended Log Format (Syslog) Webtrends Extended Log Format WELF Log Format (stand-alone; no syslog) WELF date/time extraction (no syslog header) WinRoute Connection Log Format
q q q q
XWall Log Format Zone Alarm Log Format Zyxel Firewall Log Format Zyxel Firewall WELF Log Format
Application
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Active PDF Log Format Arcserve NT Log Format AutoAdmin Log Format Backup Exec Log Format BroadVision Error Log Format BroadVision Observation Log Format Cognos Powerplay Enterprise Server Cognos Ticket Server Log Format ColdFusion Application Log Format ColdFusion Application Log Format (CSV) Fiserv Financial Easy Lender - Unsuccessful Login Audit Easy Lender - Login Audit - Comma Separated Oracle Hyperion Essbase Log Format Filemaker Log Format Filemaker 3 Log Format FusionBot Log Format Java Bean Application Server Log Format JBoss Application Server Log Format JIRA Log Format Kaspersky Labs AVP Client (Spanish) Log Format Kaspersky Labs AVP Server (Spanish) Log Format log4j Log Format LRS VPSX Accounting Log Format Microsoft Office SharePoint Server Log Format Microsoft SQL Profiler Export Microtech ImageMaker Error Log Format MicroTech ImageMaker Media Log Format Mod Gzip Log Format MPS Log Format iPlanet/Netscape Directory Server Format NVDcms Log Format Oracle Application Server (Java Exceptions) Oracle Audit Log Format Oracle Listener Log Format Oracle Failed Login Attempts Log Format Performance Monitor Log Format Plesk Server Administrator Web Log Policy Directory Audit Log Format Policy Directory Security Audit Trail Log Format PortalXPert Log Format Samba Server Log Format Sawmill messages.log Log Format ShareWay IP Log Format SiteMinder Policy Server Log Format SiteCAM Log Format SiteKiosk Log Format SiteKiosk 6 Log Format SNARE Epilog Collected Oracle Listener Log Format Software602 Log Format Sun ONE Directory Server Audit Log Format Sun ONE Directory Server Error Log Format Sun ONE / Netscape Directory Server Log Format Sybase Error Log Format Symantec AntiVirus Corporate Edition Symantec AntiVirus Corporate Edition (VHIST Exporter)
q q q q q q q q q
TerraPlay Accounting Log Format Tivoli Storage Manager TDP for SQL Server Format Vamsoft Open Relay Filter Enterprise Edition Log Format WebNibbler Log Format Web Sense Log Format Wipro Websecure Audit Log Format Wipro Websecure Auth Log Format Wipro Websecure Auth (Alternate Dates) Wipro Websecure Debug Log Format
Sawmill automatically detects all of these formats, and arranges your profile options intelligently based on your log format. If your format is not supported, we can created it for a fee. If you're interested in having us create the plug-in, please send a sample of your log data (1 Meg is ideal, but anything more than ten lines will do) to support@flowerfire.com and we will send a quote. Alternately, you can create your own plug-in; see Creating Log Format Plug-ins (Custom Log Formats).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Intellectual Property Rights If you create a plug-in then you own all the rights to that plug-in and can decide if you want to release it to us for inclusion in the standard Sawmill distribution. If you release it to us, then you agree to give us unlimited rights to reproduce and distribute at no cost. If we create the plug-in (for free or for a fee) then we own all the intellectual rights to the plug-in.
We Create the Plug-in If you would like us to create your plug-in then submit your request to us at support@flowerfire.com. If the format is publicly available, support may be provided at no charge; and if so, it will be put into our log format queue. There is a continuous demand for new log file formats so there may be a significant delay before the plug-in is complete. For a faster response you may pay us to create the plug-in, which can be returned quickly.
Additional topics:
Parsing the Log Data: Parsing with the Parsing Regular Expression
There are three ways of extracting field values from the log data: a parsing regular expression, a delimited approach (a.k.a., "index/subindex"), and parsing filters. Only one of these three approaches is typically used in a plug-in. This section describes parsing using a parsing regular expression. For a description of parsing with delimiters, see "Parsing With Delimited Fields". For a description of parsing with parsing filters, see "Parsing With Parsing Filters: Lines With Variable Layout" and "Parsing With Parsing Filters: Events Spanning Multiple Lines". The parsing regular expression is matched against each line of log data, as the data is processed. This is easily confused with the autodetection regular expression, but the purpose of the two are very different. The autodetection regular expression is used only during the Create Profile Wizard, to show a list of formats which match the log data. The parsing regular expression is not used at all during profile creation; it is used only when log data is processed (e.g., during a database build) to extract field values from each line of the log data.
The expression should include subexpressions in parentheses to specify the field values. The subexpressions will be extracted from each matching line of data, and put into the log fields. For instance, this expression: log.format.parsing_regular_expression = "^([0-9]+/[0-9]+/[0-9]+) ([0-9]+:[0-9]+:[0-9]+),([0-9.]+),([^,]*),([^,]*),([^,]*)" extracts a date field (three integers separated by slashes), then after a space, a time field (three integers separated by colons), then a series of four comma-separated fields; the first of the comma-separated fields is an IP address (any series of digits and dots), and the rest of the fields can be anything, as long as they don't contain commas. Notice how similar the layouts of the parsing regular expression is to the autodetection regular expression. It is often possible to derive the parsing regular expression just by copying the autodetection expression and adding parentheses. As each line of log data is processed it is matched to this expression and if the expression matches, the fields are populated from the subexpressions, in order. So in this case, the log fields must be date, time, ip_address, fieldx, fieldy, and duration in that order. If the parsing regular expression is the one listed above, and this line of log data is processed: 2005/03/22 01:23:45,127.0.0.1,A,B,30 Then the log fields will be populated as follows: date: 2005/03/22 time: 01:23:45 ip_address: 127.0.0.1 fieldx: A fieldy: B duration: 30
Each event must appear on a separate line; events may not span multiple lines. Each line must contain a list of fields separated by a delimiter. For instance, it might be a list of fields separated from each other by commas, or by spaces, or by tabs.
If the log data matches these criteria, you can use delimited parsing. The example we've been using above: 2005/03/22 01:23:45,127.0.0.1,A,B,30 meets the first condition, but not the second, because the date and time fields are separated from each other by a space, but other fields are separated from each other by commas. But let's suppose the format was this: 2005/03/22,01:23:45,127.0.0.1,A,B,30 This is now a format which can be handed by delimited parsing. To use delimited parsing, you must omit the log.format. parsing_regular_expression option from the plug-in; if this option is present, regular expression parsing will be used instead. You must also specify the delimiter: log.format.field_separator = "," If the delimiter is not specified, the comma in this case, whitespace will delimit fields; any space or tab will be considered the end of one field and the beginning of the next. When using delimited parsing, you also need to set the "index" value of each log field. The index value of the field tells the
parse where that field appears in the line. Contrast this to regular expression parsing, where the position of the field in the log. fields like specifies the position in the line; delimited parsing pays no attention to where the field appears in the log.fields list, and populates fields based solely on their index (and subindex; see below). So for the comma-separated format described above, the log fields would look like this: log.fields = { date = { index = 1 } time = { index = 2 } ip_address = { type = "host" index = 3 } fieldx = { index = 4 } fieldy = { index = 5 } duration = { index = 6 } } For brevity, this is usually represented with the following equivalent syntax; any group G which has only one parameter p, with value v, can be represented as "G.p = v": log.fields = { date.index = 1 time.index = 2 ip_address = { type = "host" index = 3 } fieldx.index = 4 fieldy.index = 5 duration.index = 6 } For instance, field4 will be populated from the fourth comma-separated field, since its index is 4. When using whitespace as the delimiter, quotation marks can be used to allow whitespace in fields. For instance, this line: 2005/03/22 01:23:45 127.0.0.1 A "B C" 30 would extract the value "B C" into the fieldy field, even though there is a space between B and C, because the quote set it off as a single field value. It is in this case that the "subindex" parameter can be used; subindex tells the parser to split the quoted field one level deeper, splitting B and C apart into subindex 1 and subindex 2. So for the line above, if the fieldy field also had a "subindex = 1" parameter, fieldy would be set to "B"; if instead it had "subindex = 2", fieldy would be set to "C". If it has no subindex, or if subindex is 0, it will be set to the whole quoted field, "B C". This is used in Common Log Format (a web log format) to split apart the query string: ... [timestamp] "GET /some_page.html HTTP/1.0" 200 123 ... In this case, the operation (GET) is extracted as subindex 1 of the quoted field, and the page (/some_page.html) is extracted as subindex 2, and the protocol (HTTP/1.0) is extracted as subindex 3.
(`) quotes, and the regular expression is within that in single or double quotes. Because of this, if you want to use a backslash to escape something in a regular expression, it has to be quadrupled, e.g., you need to use \\\\[ to get \[ (a literal left square bracket) in a regular expression. The final extreme is if you need a literal backslash in a regular expression; that is respresented as \\ in regular expression syntax, and each of those has to be quadrupled, so to match a literal backslash in a regular expression in a parsing filter, you need to use \\\\\\\\. For instance, to match this line: [12/Jan/2005 00:00:00] Group\User You would need to use this parsing filter: if (matches_regular_expression(current_log_line(), '^\\\\[([0-9]+/[A-Z][a-z]+/[0-9]+) ([0-9:]+)\\\\] ([^\\\\]+)\\\\\\\\(.*)')) then ...
This represents a single event; the mail client at 12.34.56.78 connected to the server, and sent mail from bob@somewhere. com to sue@elsewhere.com. Note the sessionid <156>, which appears on every line; log data where events span lines almost always has this type of "key" field, so you can tell which lines belong together. This is essential because otherwise the information from two simultaneous connections, with their lines interleaved, cannot be separated. A parsing filter to handle this format could look like this: log.parsing_filters.parse = ` if (matches_regular_expression(current_log_line(), '^[0-9/]+ [0-9:]+: connection from ([0-9.]+); sessionid=<([0-9]+)>')) then set_collected_field($2, 'source_ip', $1); else if (matches_regular_expression(current_log_line(), '^[0-9/]+ [0-9:]+: <([0-9]+)> sender: <([^>]*)>')) then set_collected_field($1, 'sender', $2); else if (matches_regular_expression(current_log_line(), '^[0-9/]+ [0-9:]+: <([0-9]+)> recipient: <([^>]*)>')) then set_collected_field($1, 'recipient', $2); else if (matches_regular_expression(current_log_line(), '^([0-9/]+) ([0-9:]+): <([0-9]+)> mail delivered')) then ( set_collected_field($3, 'date', $1); set_collected_field($3, 'time', $2); accept_collected_entry($3, false); ) ` This works similarly to the variable-layout example above, in that it checks the current line of log data against four regular expressions to check which of the line types it is. But instead of simply assigning 12.34.56.78 to source_ip, it uses the function set_collected_field() with the session_id as the "key" parameter, to assign the source_ip field of the collected entry with key 156 to 12.34.56.78. Effectively, there is now a virtual log entry defined, which can be referenced by the key 156, and which has a source_ip field of 12.34.56.78. If another interleaved connection immediately occurs, a second "connection from" line could be next, with a different key (because it's a different connection), and that would result in another virtual log entry, with a
different key and a different source_ip field. This allows both events to be built, a field at a time, without there being any conflict between them. Nothing is added to the database when a "connection from" line is seen; it just files away the source_ip for later use, in log entry 156 (contrast this to all other parsing methods discussed so far, where every line results in an entry added to the database). Now it continues to the next line, which is the "sender" line. The "connection from" regular expression doesn't match, so it checks the "sender" regular expression, which does match; so it sets the value of the sender field to bob@somewhere.com, for log entry 156. On the next line, the third regular expression matches, and it sets the value of the recipient field to sue@elsewhere.com, for log entry 156. On the fourth line, the fourth regular expression matches, and it sets the date and time field for log entry 156. At this point, log entry 156 looks like this: source_ip: 12.34.56.67 sender: bob@somewhere.com recipient: sue@elsewhere.com date: 2005/03/22 time: 01:23:50 We have non-populated all fields, so it's time to put this entry in the database. That's what accept_collected_entry() does; it puts entry 156 into the database. From there, things proceed just as they would have if this had been a log format with all five fields on one line, extracted with a regular expression or with delimited parsing. So by using five "collect" operations, we have effectively put together a single virtual log entry which can now be put into the database in the usual way. In order to use parsing filters with accept/collect, you must also set this option in the plug-in: log.format.parse_only_with_filters = "true" If you don't include this line, the parser will use delimited or regular expression parsing first, and then run the parsing filters; typically, you want the parsing filters solely to extract the data from the line. The parsing filter language is a fully general language; it supports variables, nested if/then/else constructs, loops, subroutines, recursion, and anything else you would expect to find in a language. Therefore there is no limit to what you can do with parsing filters; any log format can be parsed with a properly constructed parsing filter.
automatically. This almost always works for time, but some date formats cannot be automatically determined; for instance, there is no way to guess whether 3/4/2005 means March 4, 2005 or April 3, 2005 (in this case, auto will assume the month is first). In cases where "auto" cannot determine the format, it is necessary to specify the format in the log.format.date_format option in the plug-in. Available formats are listed in Date format and Time format. Usually, the date and time fields are listed separately, but sometimes they cannot be separated. For instance, if the date/time format is "seconds since January 1, 1970" (a fairly common format), then the date and time information is integrated into a single integer, and the date and time fields cannot be extracted separately. In this case, you will need to use a single date_time log field: log.fields = { date_time = "" ip_address = "" ... } # log.fields and both the log.format.date_format option and the log.format.time_format options should be set to the same value: log.format.date_format = "seconds_since_jan1_1970" log.format.time_format = "seconds_since_jan1_1970" Fields which are listed without any parameters, like the ones above, will have default values of various parameters assigned to them. Most importantly, the label of the field will be set to "$lang_stats.field_labels.fieldname" where fieldname is the name of the field. For instance the label of the fieldx field will be set to "$lang_stats.field_labels.fieldx". This allows you to create plug-ins which are easily translated into other languages, but it also means that when you create a plug-in like this with no label value specified, you also need to edit the file LogAnalysisInfo/languages/english/lang_stats.cfg to add field labels for any new fields you have created. In that file, there is a large section called field_labels where you can add your own values to that section. For instance, in the case above, you would need to add these: fieldx = "field x" fieldy = "field y" The other fields (date, time, ip_address, and duration) are already in the standard field_labels list. The standard list is very large, so you may find that all your fields are already there. If they aren't, you either need to add them, or explicitly override the default label by defining the log field like this: log.fields = { ... fieldx = { label = "field x" } ... } # log.fields This extended syntax for the log field specifies the label in the plug-in, which means you don't need to edit lang_stats.cfg, but also means that the plug-in will not be translated into other languages. Plug-ins created for distribution with Sawmill should always use default field labels, and the field names should always be added to lang_stats, to allow for localization (translation to the local language).
different if: 1) Some of the log fields are not interesting, and are therefore not tracked in the database or 2) A database field is based on a derived field (see below), which is not actually in the log data, or 3) The log field is numerical, and needs to be aggregated using a numerical database field (see next section). Derived fields are "virtual" log fields which are not actually in the log data, but which are computed from fields which are in the log data. The following derived fields are available: Log Field Derived Log Field Notes date_time is a field of the format dd/mmm/yyyy hh:mm:ss, e.g., 12/Feb/2006 12:34:50, which is computed from the date and time values in the log data. The hour of the day, e.g., 2AM-3AM. The day of the week, e.g., Tuesday. The day of the year, e.g., 1 for January 1, through 365 for December 31. The week of the year, e.g., 1 for January 1 through January 8.
date and time date_time (both fields) date_time date_time date_time date_time host host agent agent agent page page hour_of_day day_of_week day_of_year week_of_year
domain_description A description of the domain for the host, e.g. "Commercial" for .com addresses. See the "host" comment below. location web_browser operating_system spider file_type worm The geographic location of the IP address, computed by GeoIP database. See the "host" comment below. The web browser type, e.g. "Internet Explorer/6.0". See the "agent" comment below. The operating system, e.g. "Windows 2003". See the "agent" comment below. The spider name, or "(not a spider)" if it's not a spider. See the "agent" comment below. The file type, e.g., "GIF". See the "page" comment below. The worm name, or "(not a worm)" if it's not a worm. See the "page" comment below.
Note: "Host" derived fields are not necessarily derived from fields called "host"--it can be called anything. These fields are derived from the log field whose "type" parameter is "host". So to derive a "location" field from the ip_address field in the example above, the log field would have to look like this: ip_address = { type = "host" } # ip_address You will sometimes see this shortened to this equivalent syntax: ip_address.type = "host" Note: "Agent" derived fields are similar to "host", the "agent" field can have any name as long as the type is "agent". However, if you name the field "agent", it will automatically have type "agent", so it is not necessary to list it explicitly unless you use a field name other than "agent". Note: "Page" derived fields: similar to "host", the "page" field can have any name as long as the type is "page". However, if you name the field "page" or "url", it will automatically have type "page", so it is not necessary to list it explicitly unless you use a field name other than "page" or "url". Never include a date or time field in the database fields list! The database field should always be the date_time field, even if the log fields are separate date and time fields. When creating the database fields list, it is often convenient to start from the log fields list. Then remove any log fields you don't want to track, and add any derived fields you do want to track, and remove any numerical fields (like bandwidth, duration, or other "counting" fields), which will be tracked in the numerical fields (next section). For the example above, a reasonable set of database fields is: database.fields = { date_time = "" ip_address = ""
} # mark_entry } # log.filters This log filter has a label and a comment so it will appear nicely in the log filter editor, but the real value of the filter is 'events = 1'; all the filter really does is set the events field to 1. Many plug-ins do not require any other log filters, but this one is almost always present. Make sure you always set the events field to 1! If you omit it, some or all entries will be rejected because they have no non-zero field values. Since the events field is always 1, when it is summed, it counts the number of events. So if you have 5,000 lines in your dataset, the Overview will show 5,000 for events, or the sum of events=1 over all log entries. The parameters for numerical fields are: Name label Purpose This is how the fields will appear in the reports, e.g., this will be the name of the columns in tables. This is typically $lang_stats.field_labels.fieldname; if it is, this must be defined in the field_labels section of LogAnalysisInfo/languages/english/lang_stats.cfg, or it will cause an error when creating a profile. "true" if this should be checked in the Numerical Fields page of the Create Profile Wizard. "true" if this should only be included in the database if the corresponding log field exists. If this is "false", the log field does not have to exist in the log.fields list; it will be automatically added. If this is "true", and the field does not exist in log.fields, it will not be automatically added; instead, the numerical field will be deleted, and will not appear in the database or in reports. "int" if this is an integer field (signed, maximum value of about 2 billion on 32-bit systems); "float" if this is a floating point field (fractional values permitted; effectively no limit to size) This specifies how a numerical quantity should be formatted. Options are: integer duration_compact display as an integer, e.g., "14526554" display as a compact duration (e.g., 1y11m5d 12:34:56) display as a duration in microseconds (e.g., 1y11m5d 12:34:56.789012) display as a duration in h:m:s format (e.g., "134:56:12" for 134 hours, 56 minutes, 12 seconds) display as a fully expanded duration (e.g., "1 year, 11 months, 5 days, 12:34:56") display as bandwidth (e.g., 5.4MB, or 22kb)
default requires_log_field
type display_format_type
duration_milliseconds display as a duration in milliseconds (e.g., 1y11m5d 12:34:56.789) duration_microseconds duration_hhmmss duration bandwidth aggregation_method
"sum" if this field should be aggregated by summing all values; "average" if it should be aggregated by averaging all values; "max" if it should be aggregated by computing the maximum of all values; "min" if it should be aggregated by computing the minimum of all values. If not specified, this defaults to "sum". See below for more about aggregation.
average_denominator_field The name of the numerical database field to use as the denominator when performing the "average" calculation, when aggregation_method is average. This is typically the "entries" (events) field. See below for more information about aggregation. entries_field "true" if this is the "entries" (events) field; omit if it is not
The numbers which appear in Sawmill reports are usually aggregated. For instance, in the Overview of a firewall analysis, you may see the number of bytes outbound, which is an aggregation of the "outbound bytes" fields in every log entry. Similarly, if you look at the Weekdays report, you will see a number of outbound bytes for each day of the week; each of these numbers is an aggregation of the "outbound bytes" field of each log entry for that day. Usually, aggregation is done by summing the values, and that's what happens in the "outbound bytes" example. Suppose you have 5 lines of log data with the following values for the outbound bytes: 0, 5, 7, 9, 20. In this case, if the aggregation method is "sum" (or if it's not specified), the Overview will sum the outbound bytes to show 41 as the total bytes. In some cases it is useful to do other types of aggregation. The aggregation_method parameter provides three other types, in addition to "sum": "min", "max", and "average". "Min" and "max" aggregate by computing the minimum or maximum single
value across all field values. In the example above, the Overview would show 0 as the value if aggregation method was "min", because 0 is the minimum value of the five. Similarly, it would show 20 as the value if the aggregation method was "max". If the aggregation method was "average", it would sum them to get 41, and then divide by the average_denominator_field value; typically this would be an "entries" (events) field which counts log entries, so its value would be 5, and the average value shown in the Overview would be 41/5, or 8.2 (or just 8, if the type is "int").
Log Filters
In the log filters section, you can include one or more log filters. Log filters are extremely powerful, and can be used to convert field values (e.g., convert destination port numbers to service names), or to clean up values (e.g. to truncate the pathname portion of a URL to keep the field simple), or to reject values (e.g. to discard error entries if they are not interesting), or just about anything else. There aren't many general rules about when to use log filters, but in general, you should create the "events" filter (see above), and leave it at that unless you see something in the reports that doesn't look quite right. If the reports look fine without log filters, you can just use the events filter; if they don't, a log filter may be able to modify the data to fix the problem.
the default error_message report will include only a number_of_errors column, and not a hits or page_views column. Since cross-reference groups are tables which optimize the generation of particular reports, the associated cross-reference groups for those reports will also not have the non-associated fields. Fields which are missing from database_field_associations are assumed to be associated with all numerical fields, so the date_time reports would show all three numerical fields in this case (which is correct, because all log entries, from both formats, include date_time information). If the database_field_associations node itself is missing, all non-numerical fields are assumed to be associated with all numerical fields.
so the web interface knows what to call the menu group, e.g. : menu = { groups = { fieldxy_group = "Field X/Y" ...
There is always one report per database field; it is not possible to omit the report for any field. There is only one report per database field; it is not possible to create several reports for a field, with different options. You can not add filters to reports. It is not possible to customize which columns appear in reports; reports always contain one non-numerical field, and all associated numerical fields (see Database Field Associations). Reports cannot be created with graphs, except date/time reports; it is not possible to create date/time reports without graphs. It is not possible to override the number of rows, sort order, or any other report options.
The manual report creation approach described in this section overcomes all these limitations, because all reports, and their options, are specified manually. However, reports use default values for many options, so it is not necessary to specify very much information per report; in general, you only need to specify the non-default options for reports. Here's an example of a very basic manual report creation, for the example above: create_profile_wizard_options = { # How the reports should be grouped in the report menu manual_reports_menu = true report_groups = { overview.type = "overview" date_time_group = { items = { date_time = { label = "Years/months/days" graph_field = "events" only_bottom_level_items = false } days = { label = "Days" database_field_name = "date_time" graph_field = "events" } day_of_week = { graph_field = "events" } hour_of_day = { graph_field = "events" } } } # date_time_group ip_address = true location = true fieldxy_group = { items = { fieldx = true
fieldy = true } } log_detail = true single_page_summary = true } # report_groups } # create_profile_wizard_options This has the same effect as the automatic report grouping described above. Note: 1. The option "manual_reports_menu = true" specifies that manual report generation is being used. 2. The date_time group has been fully specified, as a four-report group. 3. The first report in the date_time group is the "Years/months/days" report, with label specified by lang_stats. miscellaneous.years_months_days (i.e., in LogAnalysisInfo/language/english/lang_stats.cfg, in the miscellaneous group, the parameter years_months_days), which graphs the "events" field and shows a hierarchical report (in other words, a normal "Years/months/days" report). 4. The second report in the date_time group is the "Days" report, with label specified by lang_stats.miscellaneous.days, which graphs the "events" field and shows a hierarchical report (in other words, a normal "Days" report). 5. The third report in the date_time group is a day_of_week report, which graphs the "event" field. 6. The fourth report in the date_time group is the hour_of_day report, which graphs the "event" field. 7. The ip_address, location, fieldx, and fieldy reports are specified the same as in automatic report creation, except for the addition of an "items" group within each group, which contains the reports in the group. Nothing is specified within the reports, so all values are default. 8. The log_detail and single_page_summary reports are specified manually (they will not be included if they are not specified here). To simplify manual report creation, there are many default values selected when nothing is specified: 1. If no label is specified for a group, the label in lang_stats.menu.group.groupname is used (i.e., the value of the groupname node in the "group" node in the "menu" node of the lang_stats.cfg file, which is in LogAnalysisInfo/ language/english). If no label is specified and the group name does not exist in lang_stats, the group name is used as the label. 2. If no label is specified for a report, and the report name matches a database field name, then the database field label is used as the report label. Otherwise, if lang_stats.menu.reports.reportname exists, that is used as the label. Otherwise, if lang_stats.field_labels.reportname exists, that is used as the label. Otherwise, reportname is used as the label. 3. If a "columns" group is specified in the report, that is used to determine the columns; the column field names are taken from the field_name value in each listed column. If it contains both numerical and non-numerical columns, it completely determines the columns in the report. If it contains only non-numerical columns, it determines the non-numerical columns, and the numerical columns are those associated with the database_field_name parameter (which must be specified explicitly in the report, unless the report name matches a database field name, in which case that database field is used as the database_field_name); see Database Field Associations. If no columns node is specified, the database_field_parameter is used as the only non-numerical column, and all associated numerical fields are used as the numerical columns. 4. If a report_menu_label is specified for a report, that value is used as the label in the reports menu; otherwise, the report label is used as the label in the reports menu. 5. If a "filter" is specified for a report, that filter expression is used as the report filter; otherwise, the report filter is used. 6. If only_bottom_level_items is specified for a report, the report shows only bottom level items if the value is true, or a hierarchical report if it is false. If it is not specified, the report shows only bottom level items. 7. If graph_field is specified for a report, a graph is included which graphs that field. 8. Any other options specified for a report will be copied over to the final report. For instance, graphs.graph_type can be set to "pie" to make the graph a pie chart (instead of the default bar chart), or "ending_row" can be set to change the number of rows from the default 20. For instance, here is an advanced example of manual report grouping, again using the example above:
create_profile_wizard_options = { # How the reports should be grouped in the report menu manual_reports_menu = true report_groups = { overview.type = "overview" date_time_group = { items = { days = { label = "Days" database_field_name = "date_time" graph_field = "duration" } day_of_week = { graph_field = "duration" } } } # date_time_group ip_address = true location = true fieldxy_group = { label = "XY" items = { fieldx = { label = "Field X Report (XY Group)" report_menu_label = "Field X Report" } fieldy = { sort_by = "events" sort_direction = "ascending" graph_field = "events" graphs.graph_type = "pie" } # fieldy fieldx_by_fieldy = { label = "FieldX by FieldY" ending_row = 50 columns = { 0.field_name = "fieldx" 1.field_name = "fieldy" 2.field_name = "events" 3.field_name = "duration" } # columns } # fieldx_by_fieldy } # items } # fieldxy_group log_detail = true } # report_groups } # create_profile_wizard_options Notes About the Example Above :
q q
q q
The date_time group has been simplified; the hierarchical years/months/days report has been removed, as has the hours of day report. The label of the fieldxy group has been overriden to "XY" The fieldx report has a custom label, "Field X Report (XY Group)", but it has a different label, "Field X Report" for its entry in the reports menu. The fieldy report is sorted ascending by events, and includes an "events" pie chart. A new report has been added, fieldx_by_fieldy (with label "FieldX by FieldY"), which is a 50-row table showing both fieldx and fieldy. This report will aggregate totals for each fieldx/fieldy pair, and will show the number of events and total duration for each pair, in an indented two-column table. The single-page summary has been omitted.
Debugging
Log format plug-ins are almost always too complex to get right the first time; there is almost always a period of debugging after you've created one, where you fix the errors. By far the most useful debugging tool available is the command-line database build with the -v option. Once you've created a profile from the plug-in, build the database from the command line like this: sawmill -p profilename -a bd -v egblpfdD | more (use Sawmill.exe on Windows). That will build the database, and while it's building, it will give detailed information about what it's doing. Look for the lines that start with "Processing" to see Sawmill looking at each line of log data. Look for the lines that start with "Marking" to see where it's putting data into the database. Look at the values it's putting in to the database to see if they look right. In between, look for the values it's extracting from the log data into the log fields, to be sure the fields values are what they should be. If you're using regular expressions to parse, Sawmill will show you what the expression is, and what it's matching it against, and whether it actually matched, and if it did, what the subexpressions were. Careful examination of this output will turn up any problems in the plug-in. When you've found a problem, fix it in the plug-in, then run this to recreate your profile: sawmill -p profilename -a rp Then rerun the database build with the command above, and repeat until everything seems to be going smoothly. If the data seems to be populating into the database properly, switch to a normal database build, without debugging output: sawmill -p profilename -a bd When the build is done, look at the reports in the web interface; if you see any problems, you can return to the debugging output build to see how the data got in there. When you have your log format plug-in complete, please send it to us! We'd be delighted to include your plug-in as part of Sawmill.
example: set_collected_field('', 'date', $1) See above for information on using set_collected_field() to collect fields into entries. Syslog plug-ins must set the variable volatile.syslog_message to the message field. Syslog plug-ins should not accept entries; that is the responsibility of the syslogging device plug-in. Syslog plug-ins should always use "auto" as the date and time format; if the actual format is something else, they must use normalize_date() and normalize_time() to normalize the date and time into a format accepted by "auto". Other than that, syslog plug-ins are the same as other plug-ins. A syslogging device plug-in (a.k.a., a syslog_required plug-in) is also slightly different from a normal plug-in. First, the log. miscellaneous.log_data_type option is set to "syslog_required": log.miscellaneous.log_data_type = "syslog_required" Secondly, the plug-in should only define log fields and database fields which are in the syslog message. These vary by format, but do not include date, time, logging device IP, or syslog priority. Syslogging device plug-ins must always use log parsing filters. Since the syslog plug-in collected date and time into the empty key entry, syslogging device plug-ins must copy those over to another key if they use keyed collected entries. If they do not use keys, they can just collect all their fields into the collected entry. The syslogging device should accept collected entries.
Getting Help
If you have any problems creating a custom format, please contact support@flowerfire.com -- we've created a lot of formats, and we can help you create yours. If you create a log format file for a popular format, we would appreciate it if you could email it to us, for inclusion in a later version of Sawmill.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill newsletters:
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
2006-12-15: Using .cfg Maps to Embed External Metadata in Sawmill Reports 2007-01-15: Ignoring Spider Traffic 2007-02-15: Creating and Populating Custom Fields 2007-03-15: Emailing Reports From Environments with SMTP Authentication 2007-04-15: Adding Calculated Columns to a Report 2007-05-15: Sending Email Alerts Based on Real-Time Log Data Scanning 2007-06-15: Tracking Conversions with a "Session Contains" Filter 2007-07-15: Sequential Scheduling 2007-08-15: Using Database Merges 2007-09-15: Improving the Performance of Database Updates 2007-10-15: Creating Custom Reports 2007-11-15: Using "Create Many Profiles" to Create and Maintain Many Similar Profiles 2007-12-15: Customizing the Web Interface 2008-01-15: Creating A Rolling 30-day Database 2008-02-15: Using Zoom To Get More Detail 2008-03-15: Excluding Your Own Traffic With Log Filters 2008-04-15: Showing Correct Totals and Percents for Unique Values 2008-05-15: Using CFGA Files Incrementally Override CFG Files 2008-06-15: Showing Usernames, Instead of IP Addresses, in Reports 2008-07-15: Adding Users Automatically With Salang Scripting 2008-08-15: Using Sawmill To Query Log Data With SQL 2008-09-15: Using Sawmill To Report On Custom Log Data 2008-10-15: Converting Log Data With process_logs 2008-12-15: What's New In Sawmill 8 2009-01-15: Migrating Sawmill 7 Data to Sawmill 8 2009-02-15: Detecting And Alerting On Intrusion Attempts With Sawmill 2009-03-15: Date Range Filtering 2009-04-15: Creating Custom Fields, Revisited 2009-05-15: Creating Custom Actions 2009-06-15: Using Cross-Reference Tables
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Troubleshooting
You should consult this page when something goes wrong with the installation or use of Sawmill.
Sawmill crashes...
Verify that it was really Sawmill, and not your web browser, that crashed. The most common cause of crashes is when Sawmill runs out of memory while building a database. Try watching Sawmill's memory usage while it is processing, to see if it is consuming all available memory. Change the memory usage by using top on UNIX, or the Process Manager on Windows, or the About This Computer window on MacOS. Sawmill will often generate an error when it runs out of memory, but due to technical reasons, this is not always possible, and sometimes running out of memory can cause Sawmill to crash. See Sawmill runs out of memory, below, for suggestions on limiting Sawmill's memory usage. Barring out-of-memory problems, Sawmill should never crash; if it does, it is probably a significant bug in Sawmill. We do our best to ensure that Sawmill is bug-free, but all software has bugs, including Sawmill. If Sawmill is crashing on your computer,
we would like to hear about it -- please send email to support@flowerfire.com, describing the type of computer you are using and the circumstances surrounding the crash. We will track down the cause of the crash, fix it, and send you a fixed version of Sawmill.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Credits
Sawmill was created by Flowerfire, Inc. Original inspiration for Sawmill came from Kuck & Associates, Inc, and from Seized by the Tale, two sites which needed such a tool. Thanks to the makers of gd, an excellent GIF creation library which is used to create all images displayed by Sawmill, including the pie charts, line graphs, table bars, legend boxes, and icons. gd was written by Thomas Boutell and is currently distributed by boutell.com, Inc. gd 1.2 is copyright 1994, 1995, Quest Protein Database Center, Cold Spring Harbor Labs. Thanks to the makers of zlib, which Sawmill uses to process gzip and ZIP log data. Thanks to Ken Brownfield for his long-term and continues system administration support. Thanks to Jason Simpson, Ken Brownfield, and Wayne Schroll for important feedback on the 1.0 alpha versions. Thanks to Stephen Turner for his experienced input on the early 1.0 versions. Thanks to the 1.0 beta testers for help on the beta versions, especially to Gary Parker, Glenn Little, and Phil Abercrombie. Thanks to the 2.0 beta testers for help on the beta versions, especially to Gary Parker, Vincent Nonnenmach, and Glenn Little. Thanks to all the 3.0 beta testers, especially Vincent Nonnenmach. Thanks to all the 4.0 beta testers, especially (yet again) Vincent Nonnenmach, and many others. Thanks to all the 5.0 beta testers, especially Fred Hicinbothem, Yuichiro Sugiura, and Peter Strunk. Thanks to all the 6.0 beta testers, especially Ed Kellerman, Noah Webster, Johnny Gisler, Morgan Small, Charlie Reitsma, James K. Hardy, Alexander Chang, Richard Keller, Glenn Little, Eric Luhrs, and Yann Debonne. Thanks to all the 7.0 beta testers, too numerous to name here. Sawmill 8.0 used a new quality paradigm for software testing: a collaboration between Flowerfire and Softworker, a software consulting firm for quality assurance and project management. Strict controls were put into place to assure features were completed and technically correct, Flowerfire's development tracking system was improved to allow superior communications between team members and real-time data on the state of the product to be widely disseminated among need-to-know engineers and managers. The automated build system was improved with a grammatical analysis of its syntax and semantics, allowing the cross multiple-platforms consistent builds, as well as other modern software quality testing techniques. Sawmill is a much better product thanks to the help of these and other beta testers.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Copyright
Sawmill is copyrighted 1996-2009 by Flowerfire, Inc.. This is a commercial product. Any use of this product without a license is a violation of copyright law. Please don't use an illegal or pirated copy of Sawmill!
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: The Trial version is identical to the full version, except that it expires after 30 days.
Long Answer
Sawmill Trial is a free trial version, intended to let you evaluate the program without having to buy it. It is identical to the full version, except that it expires 30 days after it is first used. After the trial period is over, the trial version will no longer work, but it can be unlocked by purchasing a license, and all settings, profiles, and databases will remain intact.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Enterprise supports MySQL, RBAC, multithreaded database builds, real-time reporting, and full interface
customization.
Long Answer
Sawmill Enterprise is intended for large organizations with very large datasets and advanced customization needs. Sawmill Enterprise has all the features of Sawmill Professional, and the following additional features.
q
MySQL Server Support. Support for MySQL as a back-end database. This allows the data collected by Sawmill to be queried externally, and provides much greater scalability through the use of multi-computer database clusters. Role-based Authentication (RBAC). The Enterprise version has the capability of using RBAC to the fullest exent, with full customization of roles related to users, with multiple roles assigned in any configuration. Oracle Database Support. Only the Enterprise version has support for Oracle, and can be configured to gather information directly from the database. Real-time Reporting. Enterprise has the capability to process data as needed, in real-time, to give you up to the minute reports. Interface customization. The web interface for Sawmill is written entirely in its internal language (somewhat similar to perl). With Enterprise licensing, these files can be edited, providing complete customization of the entire user interface, both administrative and non-administrative.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: You can unlock your trial installation by entering your license key in the Licensing page.
Long Answer
You don't have to download again. When you purchase, you get a license key by email. You can enter that key into the Licensing page (which you can get to by clicking Licensing on the Administrative menu) to unlock a trial installation, converting it into a fully licensed installation.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Go to the Licensing page, delete your expired license, and click "Try Sawmill For 30 Days."
Long Answer
Sawmill's trial license allows you to use it for evaluation purposes only. However, if after 30 days you still have not had a chance to fully evaluate Sawmill, you can extend your trial for another 30 days by doing the following: 1. Go to the Licensing page. 2. Delete your current trial license. 3. Click the "Try Sawmill for 30 Days" button. This will work only once -- after that, you will need to contact us at support@flowerfire.com if you want to extend your trial period further.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: When upgrading from an older 7.x version to a newer 8.x version, start with the new version and use the
Import Wizard.
Long Answer
For all users: Installing a newer 8.x version of Sawmill over your existing Sawmill 7 installation will not result in data loss. The installer will simply install what's necessary and will not overwrite or remove your existing profiles, databases, or any user configuration data. Once the install is complete upgrading is complete, and you are now ready to continue using Sawmill. Use the Import Wizard, to import your data from your version 7 LogAnalysisInfo directory. If you're upgrading from an older 8.x to a newer 8.x, start with the new LogAnalysisInfo. In order to preserve profiles, settings, databases, and more, you need to copy them from the old LogAnalysisInfo folder. Here are the parts you may want to copy: 1. Profiles. Copy all files except default_profile.cfg, from your existing profiles folder in the LogAnalysisInfo folder, to the new profiles folder. 2. Databases. Copy folders from your existing LogAnalysisInfo folder to the new one. 3. Schedules. Copy the file schedules.cfg from your existing LogAnalysisInfo folder to the new one. 4. Users. Copy the file users.cfg from your existing LogAnalysisInfo folder to the new one. 5. User Settings. Copy the directory user_info from your existing LogAnalysisInfo folder to the new one. 6. Licenses. Copy the file licenses.cfg from your existing LogAnalysisInfo folder to the new one.
q
NOTE: Default Profile. Should not be copied! This file cannot be copied because the new file may have new options that the old one did not. If you have changed default_profile.cfg, and need to keep your changes, you will need to merge them by cut/pasting specific option changes over manually from the old file to the new file using a text editor. In some cases, for simple upgrades, you can simply copy the entire LogAnalysisInfo folder to the new installation. But in general, there are changes to files in the LogAnalysisInfo folder from one version to the next, so you may not get the latest features and bug fixes, and Sawmill may not even work, if you use an older LogAnalysisInfo. Therefore, it is best to use the new LogAnalysisInfo, and copy just the pieces you need from the old one.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Windows 95/98/ME/NT/2000/XP/2003, MacOS, most versions and variants of UNIX.
Long Answer
Sawmill runs on Windows 95/98/ME/NT/2000/XP/2003, MacOS X, and most popular flavors of UNIX (Linux, Solaris, FreeBSD, OpenBSD, NetBSD, BSD/OS, Tru64 UNIX (Digital Unix), IRIX, HP/UX, AIX, OS/2, and BeOS). Binary versions are available for the most popular platforms; on less common platforms, it may be necessary to build Sawmill yourself from the source code (which is available for download in encrypted/obfuscated format). That's just the server; once you have the server running, you can configure Sawmill, generate reports, and browse reports from any computer, using a normal web browser.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: At least 128 Meg RAM, 512 Meg preferred; 500 Meg disk space for an average database; and as much
CPU power as you can get.
Long Answer
Sawmill is a heavy-duty number crunching program, and can use large amounts of memory, CPU, and disk. You have some control over how much it uses of each, but it still requires a reasonably powerful computer to operate properly. Sawmill uses around 30 Meg of memory when it processes a small to medium size log file, and it can use considerably more for very large log files. The main memory usage factors are the "item lists", which are tables containing all the values for a particular field. If you have a field in your data, which is very complex, and has many unique values (the URL query field for web log data is a common example of this), the item list can be very large, requiring hundreds of megabytes of memory. This memory is mapped to disk to minimize physical RAM usage, but still contributes to the total virtual memory usage by Sawmill. So for database with very complex fields, large amounts of RAM will be required. For large datasets, it is possible for Sawmill to use more than 2GB of address space, exceeding the capabilities of a 32-bit system; in this situation, it is necessary to use a 64-bit system, or a MySQL database, or both (see Database Memory Usage and Sawmill uses too much memory for builds/updates, and is slow to view). This typically will not occur with a dataset smaller than 10GB, and if it often possible to process a much larger dataset on a 32-bit system with 2GB. A dataset over 100GB will often run across this issue, however, so a 64-bit system is recommended for very large datasets. If your system cannot support the RAM usage required by your dataset, you may need to use log filters to simplify the complex database fields. The Sawmill installation itself takes less than 50 Meg of disk space, but the database it creates can take much more. A small database may be only a couple megabytes, but if you process a large amount of log data, or turn on a lot of cross-references and ask for a lot of detail, there's no limit to how large the database can get. In general, the database will be somewhere on the order of 50% the size of the uncompressed log data in it, perhaps as much as 100% in some cases. So if you're processing 100G of log data, you should have 100G of disk space free on your reporting system to hold the database. If you use an external (e.g. SQL) database, the database information will take very little space on the reporting system, but will take a comparable amount of space on the database server. Disk speed is something else to consider also when designing a system to run Sawmill. During log processing, Sawmill makes frequent use of the disk, and during statistics viewing it uses it even more. Many large memory buffers are mapped to disk, so a disk speed can have a very large impact on database performance, both for processing log data and querying the database. A fast disk will increase Sawmill's log processing time, and the responsiveness of the statistics. SCSI is better than IDE, and SCSI RAID is best of all. During log processing, especially while building cross-reference tables, the CPU is usually the bottleneck -- Sawmill's number crunching takes more time than any other aspect of log processing, so the rest of the system ends up waiting on the CPU most of the time. This means that any improvement in CPU speed will result in a direct improvement in log processing speed. Sawmill can run on any system, but the more CPU power you can give it, the better. Large CPU caches also significantly boost Sawmill's performance, by a factor of 2x or 3x in some cases.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Sawmill can handle all major log formats and many minor formats, and you can create your own custom
formats.
Long Answer
Sawmill is not just for web server logs, though it's well suited to that task. Sawmill also supports firewall logs, proxy logs, mail logs, antivirus logs, network logs, FTP logs, and much more. Click here for the full list of Supported Log Formats. It automatically detects all the formats it supports, and chooses appropriate settings for the format. We're continually adding new log formats, so the list above will keep growing. However, due to the large number of format requests, we cannot add all the formats that are requested. If your log format is not recognized by Sawmill, and you need support for a format, we can add it to Sawmill for a fee; contact support@flowerfire.com for details. If you want to analyze a log in a different format, Sawmill also lets you create your own format description file; once you've done that, your format becomes one of the supported ones--Sawmill will autodetect it and choose good options for it, just like any built-in format. Sawmill's format description files are very flexible; almost any possible format can be described. If you have an unsupported format and you'd like help writing a format file, please contact support@flowerfire.com, and we'll write a format file for you, at no charge.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Among other things, Sawmill does not generate static reports -- it generates dynamic, interlined reports.
Long Answer
There are many areas in which Sawmill beats the competition, but one major one is that Sawmill's statistics are dynamic, and its statistics pages are interlinked. Most other log analysis programs are report-based -- you specify certain criteria (like, "give me all hits on my web site on January 14, broken down by page") and it generates a single report, and it's done. If you want more detail about something, it's not available, or it's only available if you reprocess the log data with different settings. Sawmill generates HTML reports on the fly, and it supports zooming, filtering, and many other dynamic features. You can zoom in a certain directory, for instance, and then see the events for that directory broken down by date, or by IP, or by weekday, or in any other way you like. You can create arbitrary filters, for instance to zoom in on the events for a particular address on a particular day, or to see the search terms that were used from a particular search engine on a particular day, which found a particular page. Sawmill lets you navigate naturally and quickly through hierarchies like URLs, pages/ directories, day/month/years, machine/subnets, and others. Of course, there are many other features that set Sawmill apart from the competition-- see our web site for a complete list.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Installations vary from customer to customer--Sawmill provides enough flexibility to let you choose the
model that works best for you.
Long Answer
There are quite a lot of different "models" that different customers use. For web server analysis, it is common to have Sawmill running on the active web server, either stand-alone or in web server mode, accessing the growing log files directly; this works well as long as the dataset is not too large and the server is not too heavily loaded. For very large datasets, however, many customers have dedicated Sawmill machines, which pull the logs over the network from the server(s). Databases are generally updated regularly; it's common to have them updated in the middle of the night, every night, using the Sawmill Scheduler or an external scheduler like cron. In terms of the database layout, some common models include:
q
A single database. Most customers use a single large database that contains all their data. This works well if you have a lot of disk space and a fast computer (or computers) to process your logs with, or if your log data is not too large. You can use Sawmill's normal filtering features to zoom in on particular parts of the data, but it's all stored in a single database. Sawmill has other features that can be used to limit certain users to certain parts of the database; this is particularly useful for ISPs who want to store all their customers' statistics in a single large database, but only let each customer access their own statistics. A "recent" database and a long-term database. In cases where log data is fairly large (say, more than 10 Gigabytes), or where disk space and/or processing power is limited, some customers use two databases, one in detail for the recent data (updated and expired regularly to keep a moving 30-day data set, for instance), and the other less detailed for the long-term data (updated regularly but never expired). The two databases combined are much smaller than a single one would be because they use less overall information, so it takes less time to process the logs and to browse the database. This is often acceptable because fine detail is needed only for recent data. A collection of specialized databases. Some customers use a collection of databases, one for each section of their statistics. This is particularly useful for log data in the multi-Terabyte range; a tightly-focused database (for instance, showing only hits on the past seven days on a particular directory of the site) is much smaller and faster than a large all-encompassing database. This is also useful if several log files of different types are being analyzed (for instance, an ISP might have one database to track bandwidth usage by its customers, another to track internal network traffic, another to track usage on its FTP site, and another to track hits on its own web site).
There are a lot of options, and there's no single best solution. You can try out different methods, and change them if they're not working for you. Sawmill provides you the flexibility to choose whatever's best for you.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: There are no limits, except those imposed by the limitations of your server.
Long Answer
There is no fundamental limit -- given enough memory, disk space, and time, you can process the world. We've processed log files terabytes in size, billions of lines long, and been able to browse their statistics at full complexity in real time, with no troubles.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Use an internal database, build a separate database on each computer, and merge them
Long Answer
Sawmill has several features which can be used to script this sort of approach. Specifically, if you are using the internal database, there is a "database merge" operation which can be performed from the command line, using "-a md". This merges two existing databases into a single database. Using this feature, it becomes possible to write a script to 1) split your dataset into 100 equal parts, 2) build 100 databases on 100 Sawmill installations on 100 computers, and 3) run 99 merge operations in sequence on a single computer (e.g. "sawmill -p profilename -a md -mdd /database/from/installationN" to merge in the data from the Nth installation), to combine them all into a single database. Since merge operations are generally much faster than log processing, this approach can be used to parallelize a huge log processing operation, greatly accelerating it.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Unlike many other log analysis tools, Sawmill doesn't care what order your log data is it. It can be in any order you like. That means that if you have several log files from several servers in a cluster, you can dump the data from all of them into the same database, without worrying that their date ranges overlap.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
To create many profiles in a batch, all based on a particular "template" profile, you can use the create_many_profiles command-line feature. To do that, start by editing the file LogAnalysisInfo/miscellaneous/create_many_profiles.cfg file, using a text editor. Do the following:
q
Change template_profile_name to the internal name of the profile you want to use as a template. The internal name is the name of the file in LogAnalysisInfo/profiles, but without the .cfg extension, so this might be: template_profile_name = "my_profile"
Change "clone1 = {" to the internal name of the first profile, the one you want to create: derived_profile_1 = {
Change the label to the "human readable" name of the profile, e.g.: label = "Derived Profile 1"
Make changes inside the changes section of the clone1 group, to change any options that you want changed in the profile (whatever should be different from the template profile). One common change is the log source pathname; the following line changes the "pathname" option in the "0" group of the "source" group of the "log" group of the profile .cfg file; i.e., it changes the pathname of log source 0 (which is typically the first log source), so it looks for its log data in / logs/for/clone1: log.source.0.pathname = "/logs/for/clone1" Another common change is to add a filter to reject all but a certain class of events, in this profile's database; for instance, this rejects all hits in IIS logs where the page doesn't start with "/abc", resulting in a profile showing only hits on the "abc" directory of the web site: log.filters.2 = "if (!starts_with(cs_uri_stem, '/abc')) then 'reject';"
Repeat this for as many profiles as you need, by duplicating the clone1 section for each profile, choosing a new internal name (replacing clone1) and label (replacing "Clone 1") for each new profile. Run Sawmill from the command line, for Windows: Sawmill.exe -dp templates.admin.profiles.create_many_profiles or the following command line, for non-Windows: sawmill -dp templates.admin.profiles.create_many_profiles
This step will create all profiles. At any time, you can recreate all profiles without affecting their databases. So by editing only the template profile, and using create_many_profiles to propagate changes to all clones, you can maintain all profiles with only one template profile.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Log files are text files created by your server, recording each hit on your site. Sawmill generates its
statistics by analyzing log files.
Long Answer
Log files are large, ugly text files generated by web servers, proxy server, ftp servers, and just about every other kind of server. Every time something happens on the server (it serves a file, or delivers a message, or someone logs in, or something else), the server logs that information to the file, which continues to grow as new events occur. Log files are not particularly human-readable, and do not generally contain summarizing information, which is why Sawmill exists -- Sawmill processes your log files, summarizes them and analyzes them in many ways, and reports it back to you in a much friendlier format-graphs, tables, etc. You need to have access to your log files to use Sawmill. If you don't have log files, Sawmill can't do anything for you. If you don't know where your log files are, ask your server administrator (hint: they are often stored in a directory called "logs"). In some cases, servers are configured so they do not keep log files, or the logs are hidden from users; in these situations, you will not be able to use Sawmill. Again, your server administrator can help you find your log files, or they can tell you why they're not available. If you're trying to analyze a web site, and your ISP does not provide logs for you, you may want to consider switching to one that does.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ: Scheduling
Question: Can Sawmill be configured to automatically analyze the access log for my site on a shared server once a day at
a given time?
Short Answer: Yes, if you run it stand-alone, or if your server has a scheduling program.
Long Answer
It depends on your web server. If you run Sawmill as a stand-alone program (rather than as a CGI program) on your server, then you can use Sawmill's built-in Scheduler to do this. If you can't run it stand-alone or don't want to, then you can still set up automatic database builds if your server has its own scheduling program (like cron or Windows Scheduler).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Set the Server Hostname option and the Web Server Port option in the Network section of the
Preferences.
Long Answer
By default, Sawmill binds to all available IPs, so if there's an IP address where it is allowed to listen on port 8988, it already is (it's also listening on 127.0.0.1). If you want it to listen only on the IP you specifiy you can do it from the Preferences. Go to the Preferences, click on the Network category, change the "Server hostname" option to the IP address you want to use, and change the "Web server port" option to the port number you want to use. The next time you start Sawmill, it will automatically bind to the IP address you specified. If you're using the command-line version of Sawmill (sawmill), you can either do the same as above, or you can give Sawmill command line options to tell it which IP number and port to use: sawmill -ws t -sh 128.129.130.131 -wsp 8888 When you use these options, Sawmill will immediately start up its web server on the port you specify.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Use "extended" or "combined" log format to see referrer and agent information, or analyze the log files
with a separate profile. For error logs, analyze them with a separate profile.
Long Answer
Different log formats contain different types of information. All major web log formats include page, date/time, and browsing host information, but not all contain referrer and agent information. If your log format does not include referrer or agent information, Sawmill will not include that information in its database. The easiest way to get referrer or agent information is to change your web server's log format to an "extended," or "combined" format, which includes referrer and agent information; then Sawmill will automatically include referrer and agent information in the database and in the statistics. If it's not possible to change your log format, and you have a separate referrer log (often called referer_log), then you can analyze that log directly with Sawmill. Just point Sawmill at the log, and Sawmill should recognize it as a referrer log. Sawmill will show statistics with referrer and page information. Host and date/time information are not available in a standard referrer log, so referrer and page is all Sawmill can extract. By using an extended or combined log format, you will be able to do more powerful queries, for instance to determine the referrers in the most recent week. Similarly, if you can't configure your server to use extended or combined, but you have a separate agent log, you can analyze it with Sawmill by creating a separate profile that analyzes the agent (web browser and operating system) information only. Since an agent log contains only agent information, you won't be able to cross-reference the agent information with page, date/time, host, referrer, or anything else; to do that, you'll need an extended or combined format log. To analyze error information, you'll need an error log (often called error_log). Just point Sawmill at your error log when you create the profile. Since the error log contains only error messages, you won't be able to cross-reference the errors against page, date/time, or any other fields; if you need cross-referencing of errors, you may be able to get what you need by crossreferencing the "server response" field of your normal web log to the fields you need cross-referenced; then apply "404" as a filter on the server response field and you'll see only those web site hits that generated 404 (file not found) errors.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Sawmill is currently available in English, German, and Japanese, and can be translated into any language
fairly easily. Customization of output text is also easy.
Long Answer
Sawmill has a feature designed for just this purpose, called Language Modules. Language modules are text files which contain all of the text that Sawmill ever generates. You can translate part or all of Sawmill into any language by modifying the language modules. English, German, and Japanese translations already exist. Language modules can also be used to customize the output of Sawmill in almost any conceivable way. For full details, see Language Modules--Localization and Text Customization in the online manual.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes; run it as a Service on Windows; use StartupItems under MacOS X; use the /etc/rc.d mechanism on
UNIX systems that support it.
Long Answer
Sawmill can be configured to run at startup in the same way any other program can, and the exact method depends on your operating system. Here's how: On Windows: Sawmill is automatically installed as a Service, and will be running as soon as installation is complete. The Service is set to automatically start when the system starts up. You can edit Service parameters, for instance to have it run as a different user, or to have it start manually, in the Services control panel. On MacOS X: 1. Install Sawmill in its default location at /Applications/Sawmill. 2. If the folder /Library/StartupItems does not exist, create it. 3. Copy the Sawmill folder from /Applications/Sawmill/Startup to /Library/StartupItems. On RedHat 9 Linux or later: Do the following as root: 1. Move the sawmilld file from the Extras/RH9 directory of your Sawmill installation, to /etc/rd.d/init.d. Type chkconfig --add sawmilld chkconfig --level 2345 sawmilld on to install it and turn it on. 2. Rename the Sawmill executable to sawmill (or change the name of the executable in the script) and put it in /etc/ sawmill. 3. Put a symbolic link to LogAnalysisInfo in /etc/sawmill/LogAnalysisInfo (or you can put the actual directory there), using the ln -s command, e.g. ln -s /usr/home/sawmill/LogAnalysisInfo /etc/sawmill/LogAnalysisInfo (you'll need to create the directory /etc/sawmill first).
On other Linux or other UNIX-type operating system: 1. Install a script to start Sawmill in /etc/rc.d (or /etc/init.d, or however your UNIX variant does it). A sample script, based on the Apache script, is available here. The method varies from UNIX to UNIX, but to give one specific example, in RedHat Linux 7.0 you should call the script sawmilld and put it in /etc/rc.d/init.d, and then make symbolic links to it from the rc0.d - rc6.d directories, encoding into the name of the link whether to Start Sawmill.exe at that runlevel, or to Kill it. A good sequence of links is the following: ln ln ln ln ln ln ln -s -s -s -s -s -s -s /etc/rc.d/init.d/sawmilld /etc/rc.d/init.d/sawmilld /etc/rc.d/init.d/sawmilld /etc/rc.d/init.d/sawmilld /etc/rc.d/init.d/sawmilld /etc/rc.d/init.d/sawmilld /etc/rc.d/init.d/sawmilld /etc/rc.d/rc0.d/K15sawmilld /etc/rc.d/rc1.d/K15sawmilld /etc/rc.d/rc2.d/K15sawmilld /etc/rc.d/rc3.d/S85sawmilld /etc/rc.d/rc4.d/S85sawmilld /etc/rc.d/rc5.d/S85sawmilld /etc/rc.d/rc6.d/K15sawmilld
If you're not sure where to put the Sawmill links or what to call them, and you have Apache installed on your system, look for files with names containing httpd in /etc/rc.d or /etc/init.d, and use the same names and locations for Sawmill, replacing httpd with sawmilld. 2. Rename the Sawmill executable to sawmilld (or change the name of the executable in the script) and put it in /bin or somewhere else in your default path. 3. Put a symbolic link to LogAnalysisInfo in /etc/Sawmill.exe/LogAnalysisInfo (or you can put the actual directory there), using the ln -s command, e.g. ln -s /usr/home/Sawmill.exe/LogAnalysisInfo /etc/Sawmill.exe/ LogAnalysisInfo (you'll need to create the directory /etc/Sawmill.exe first).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Add an ampersand (&) to the end of the command line to run it in the background.
Long Answer
When you run Sawmill from the command line in UNIX by just typing the name of the program, it runs in the foreground. That means that you don't get your prompt back until Sawmill exits, and it also means that if you close your terminal window, the Sawmill server will terminate and you will not be able to use it anymore until you open another terminal window and restart Sawmill. Often, that's not what you want--you want Sawmill to keep running after you close the window. You can do that by running Sawmill in the background. To run Sawmill (or any other UNIX program) in the background, add an ampersand (a & character) to the end of the command line; for instance, you might use the following command line: ./Sawmill.exe & if the name of your Sawmill program is Sawmill.exe. When you type this, you will see one line of output as Sawmill is backgrounded, and a few lines from Sawmill describing the running web server, and then you will have your shell prompt back, so you can type more commands. At this point, Sawmill is running in the background. You can type exit at the prompt to close the shell, or you can just close the window, and Sawmill will continue to run in the background. On some rare occasions, Sawmill may generate output to the shell console. This is not usually a problem, but on some systems, background programs that generate output may be suspended, and that can make Sawmill inaccessible. To prevent this from happening, you may want to use this command line instead: nohup ./Sawmill.exe & The "nohup" part of the command line stands for "no hang-up" and prevents this sort of output-related suspension problem. Unfortunately nohup doesn't exist on all systems. If you don't know if your system supports nohup, try including nohup on the command line--if it doesn't run that way, don't use it. You can see current background jobs started from the current terminal using the jobs command (with most shells). You can terminate a background job by bringing it to the front using the fg command and then using control-C, or using the kill command together with its process ID. You can find the process ID (pid) of any background process (including ones started in other windows) using the ps command. For more information about any of these commands, use the man command (e.g. type man ps), or consult your UNIX documentation.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Install Sawmill somewhere else, or make a symbolic link to LogAnalysisInfo, or put the pathname of the
new location in the file LogAnalysisInfoDirLoc
Long Answer
Sawmill stores most of its data, including all internal databases, in a folder called LogAnalysisInfo. By default, this is in the same folder as the Sawmill binary. If you want it to be somewhere else, there are several options:
q
Relocate the databases. Instead of relocating the whole LogAnalysisInfo folder, relocate only the databases, by changing the database folder location in the Database section of the Config section of the profile. Since most of the disk usage of the LogAnalysisInfo folder is due to databases, this provides most of the benefit of relocating LogAnalysisInfo, if disk usage is the issue. Create a symbolic link (non-Windows only). This works only on non-Windows systems. If Sawmill is installed in a directory called /some/dir/sawmill, and you want LogAnalysisInfo to be at /some/other/dir/LogAnalysisInfo, you can do this from the command line: mv /some/dir/sawmill/LogAnalysisInfo /some/other/dir ln -s /some/other/dir/LogAnalysisInfo /some/dir This creates a symbolic link from the installation location to the new location of LogAnalysisInfo, which Sawmill will automatically follow. By the way, you can also just move LogAnalysisInfo to /var/sawmill/LogAnalysisInfo, and Sawmill will look for it there. Create a file LogAnalysisInfoDirLoc (for Windows).This is a text file containing the location of the LogAnalysisInfo folder. This is most useful on Windows; on other platforms you can use a symbolic link, as described above. For instance, if you installed Sawmill in C:\Program Files\Sawmill 8, and want LogAnalysisInfo to be at E:\LogAnalysisInfo, you can move it from C:\Program Files\Sawmill 7\LogAnalysisInfo to the E: drive, and then create a text file (with Notepad) called LogAnalysisInfoDirLoc, no .txt extension, and type E:\LogAnalysisInfo as the contents of that text file. If Sawmill does not find a LogAnalysisInfo folder in the folder where it is running, it will look for that file, LogAnalysisInfoDirLoc, and it will look for LogAnalysisInfo in the location specified by the contents of that file.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Use an external Scheduler to run jobs or to call the Sawmill Scheduler, or run Sawmill in both CGI and
web server modes.
Long Answer
Sawmill's built-in scheduler can only run scheduled jobs if Sawmill is actually running when the job's time comes. That's fine if you're running Sawmill in web server mode, where it runs all the time. But in CGI mode, Sawmill only runs when someone is actively using it, so scheduled tasks will not run. There are three main solutions to this problem: use an external scheduler to call Sawmill's scheduler, use an external scheduler to run the jobs directly, or run Sawmill on both CGI and web server modes, with the CGI mode doing everything but the scheduled jobs, and web server mode handling those.
UNIX
On UNIX, the most common scheduler is cron. You can set up cron to call Sawmill's scheduler by running the command (as root) crontab -e from the UNIX command line, and then adding * * * * * sudo -u apache /full/path/to/Sawmill.exe -scheduler to the resulting file. You will need to replace "apache" with the name of your CGI user, and you will need to replace /full/ path/to/Sawmill.exe with the full pathname of your Sawmill executable. This tells cron tab to run Sawmill every minute, as the CGI user, with the -scheduler option (which tells Sawmill to run any scheduled jobs, and exit). Another option is to run your Sawmill database builds and other jobs directly with cron; for instance you could add a line like this: 0 0 * * * sudo -u apache /full/path/to/Sawmill.exe -rfcf configname -cm ud (replace "apache" with the name of your CGI user) to update the profile specified by configname every night at midnight (the first number is the minute of the hour when the job should be run; the second number is the hour when the job should be run, and * * * means to run it every day). Yet another option is to run Sawmill in web server mode as well as CGI mode, with the web server mode instance running only for the purpose of running jobs. The two will not interfere with each other; just start Sawmill from the command line using /full/path/to/Sawmill.exe & and it will continue to run until the next reboot. If you want Sawmill to automatically restart itself at system startup, see Running Sawmill at System Startup.
Windows
Unfortunately, Windows Scheduler does not let you run jobs every minute (like UNIX cron does), so you cannot use it to call the Sawmill Scheduler directly. However, other options are available. You can use the Windows Scheduler to run your Sawmill jobs directly. For instance, to set Sawmill to update the database for a particular profile every night, do this:
q
Open the Scheduled Tasks control panel. Double-click Add Scheduled Task, and click Next. Choose "Sawmill (CGI)" from the list (if it does not appear, Browse and locate the SawmillCL.exe file, which is usually at c:\Program Files\Sawmill 8\SawmillCL.exe), and click Next. Click Daily and click Next. Choose a time to run the build; sometime in the middle of the night (like midnight) is a good choice, and click Next. Enter your username and password, and click Next. Click "Open advanced properties for this task when I click Finish," and click Next. Add "-p profilename -a ud" to the end of the Run field, and click OK.
Now Windows Scheduler is configured to update your database automatically every day. Another option is to run Sawmill in web server mode as well as CGI mode, with the web server mode instance running only for the purpose of running jobs. The two will not interfere with each other; just start Sawmill by double-clicking its icon (you can also configure it to start whenever your computer restarts, using Windows Scheduler), and scheduled jobs will run as long as Sawmill is running. If you need Sawmill to be running while you are logged out, see Running Sawmill as a Service.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Yes; just select one of the FTP log sources when Sawmill asks you where your data is. Sawmill can FTP one or more log files from any FTP server, anonymously or with a username/password.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Not directly, but you can do it by using a command-line log source to run a command line, script, or
program that does whatever is necessary to fetch the data, and prints it to Sawmill.
Long Answer
Sawmill supports many different methods of acquiring log data, including direct access to local files, and FTP or HTTP access to remote files; it can also decompress the major compression formats on the fly, including zip, gzip, and bzip2. If you need to use a different method to fetch the log data, like scp, sftp, or ssh, or if you need to read the log data from a database, or if you need to uncompress, decode, or decrypt a format that is not directly supported by Sawmill, you can do it using a commandline log source. Command-line log sources are very simple in concept. You give Sawmill a command line; it runs the command line whenever it needs to get the log data; the command, script or program you specify "prints: the log data (i.e. generates it to stdout, the standard command line output stream), and Sawmill reads the output of the command to get the log data. The provides you with unlimited flexibility in how you feed your data to Sawmill. For instance, suppose Sawmill didn't support gzip for at (it does). Then you could use the following (UNIX) command log source: /bin/gunzip -c /logs/mylog.gz. Since the -c flag tells gunzip to dump the output to stdout, Sawmill will read the log data directly from this command, without needing to use its built-in gunzipper. More usefully, any decompression utility with a similar flag can be used to allow Sawmill to read any compressed, archived, or encrypted log directly, even if it doesn't know anything about the format. Even if you don't have a program that will dump the data to stdout, you can still use this approach by writing a tiny script. Consider the following (UNIX) shell script which scp'd files from a remote server and feeds them to Sawmill: scp user@host:/logs/mylog.txt /tmp/templog cat /tmp/templog rm /tmp/templog This script copies a log file from a remote machine (securely, using scp), prints it to stdout using "cat", and deletes it when it's done. The same script with slight modifications, could copy multiple files, or use a different method than scp to fetch the files (like sftp). A simpler (and better) example which does the same thing is this command: scp -qC user@host:/logs/mylog.txt > /dev/stdout This explicitly scps the files to stdout, which sends them straight into Sawmill without the intermediate step of being stored on the disk or deleted. Since it's just one line, there's no need to use a script at all; this single line can be the command for the log source.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: As of version 8, Sawmill is installed as a service when you run the normal installer.
Long Answer
Earlier versions of Sawmill required extra steps to run them as a service, but this is no longer a problem-- the normal Windows installer automatically installs Sawmill as a service when you run it.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Sawmill's interface is entirely web browser based. Sawmill runs either as a stand-alone program (in which case it uses its own built-in web server to serve its interface), or as a CGI program (in which case it uses the normal web server on the machine). In either case, Sawmill is configured by running a web browser on any machine you choose, and accessing Sawmill as though it were a web site. Statistics are also served through a web browser interface. You do not need to be physically present at the server to configure it or to view reports; all you need is a web browser.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes, Sawmill includes a number of features for just this purpose.
Long Answer
Absolutely. This is one of our core design goals -- to make Sawmill a good choice for web hosting providers, ISPs, and others who serve multiple sites from a single server. Sawmill's profiles provide an excellent mechanism for generating different statistics for each customer or web site. If each site has its own log file(s), this is trivial; you can just make a profile that analyzes the appropriate log file. If all sites share a single log file, it's not much harder -- Sawmill's advanced filtering mechanism lets you easily ignore all log entries except those of interest to a particular web site. The technique you use depends on your situation. In general, you will need to have a separate profile for each user (you can quickly create all of your profiles using the Create/Update Many Profiles feature). For maximum flexibility, each profile can have its own database, and each profile can be password-protected or secured in some other way, to prevent unauthorized users from accessing it. See Security for a discussion of some of the ways profiles can be secured. If each profile has its own database, then the log filters can be used to filter out all statistics except those belonging to the user. If you don't care if users can access each others' statistics, you can use a single profile with a single database, and give each user a bookmark URL pointing to their statistics in the database; this is the simplest approach, but it makes it possible for one user to see another's statistics, which is usually undesirable. Advantages of using a single database:
q
Faster log processing -- log data is read only once. This is particularly important when using an FTP log source with a log file containing the data for all profiles, because the log data will be fetched once per profile, so if you have 1000 profiles, this will use 1000 times more bandwidth. For local log files, this is not much if an issue, because Sawmill skips quickly over log entries it doesn't need, so it will only be spending real time on each log entry once.
Smaller databases. Though Sawmill has to create many databases instead of one, generally the total disk usage will be smaller, because each database is tightly focused on its site, and does not need to keep around information that applies only to other sites. In one real-world example, the total database size shrunk by a factor of 200 when the customer switched from one database to many. Faster statistics browsing. A small database is generally faster to browse than a large databases, so using multiple small databases will make the statistics faster. More flexibility. Each profile can be configured separately, so you can have different cross-references, filters, database fields, etc. for different profiles. Using a single database locks you into a single database structure for all profiles.
In summary, you'll usually want to use multiple databases for multiple servers or sites. The main situation you'd want to use a single database for is if you're using FTP over a metered line to fetch the data; a single database will fetch it just once. Even then, though, you could set up an external script to fetch the log data to the local disk once, and then process it locally with Sawmill.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Yes. Any files that end with a .gz, .zip, .bz, or .bz2 will be treated as compressed files by Sawmill. It will uncompress them "on the fly" (not modifying the original file and not creating any new files), and process their uncompressed data the same way it reads normal log files.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Sawmill can read any number of log files, from any number of servers, into a single database to show a single aggregate set of reports of all the data. If the logs also contain information about which server handled each request (or if each server has a separate log file, or a set of separate log files), then Sawmill can also show per-server statistics, if desired. Unlike many log analysis tools, Sawmill does not care if the files are in order, or if their date ranges overlap -- any combinations of any number of files with data in any order are possible. To see per-server statistics, look in the reports for a report which breaks down the overall events by server. This might be called "Server domains" or "Server hosts" or "Server IPs" or something else, depending on the log data. Click on a particular server in that report; that zooms you in on that server. Now choose any other report from the "Default report on zoom" dropdown menu, to see a breakdown of the statistics for that server only. Alternatively, you can use the global filters to zoom "permanently" on a particular server, and then all reports will automatically show numbers for that server only. If you don't have a field that tracks the server, you may still be able to get per-server statistics, by using the current_log_pathname() function detect which server each hit came from. You'll need to create a custom field in that case, with a log field to track the server, a filter to compute the field from the log pathname, and a database field and report for the field. For information on creating custom fields, see Creating Custom Fields.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes, you can password protect statistics in several ways.
Long Answer
Yes. Sawmill provides several ways to do this. In general, you will create a separate user for each client, and a separate profile for each client. Then you will configure their user to be non-administrative, and to have permission to access only their own profile. Finally, you will set up their profile to show only their data, either by pointing it only at their files, or (if their data is interleaved with other clients' data), by using log filters to discard all events from the log which don't belong to them.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes, but the degree to which you can relabel depends on your license.
Long Answer
You can relabel Sawmill and it's not very difficult, however the extent to which you can relabel depends on the license purchased (i.e. Professional, Enterprise etc.). Sawmill Professional allows easy modification of certain screen attributes within the standard End User License, attributes such as colors, fonts, etc. Sawmill Professional also allows the language used on-screen to be modified or translated, plus it allows the addition of a graphic item by use of the custom HTML headers and footers, however the license does not allow the removal or replacement of any Sawmill logos or credits etc. Should you require Professional Edition with even more customization ability and you are a qualifying user or integrator then we may be able to assist you. Under these circumstances you should forward your detailed proposal to us containing precise descriptions (preferably diagrams) of how you would wish the screens to look and we will respond. Sawmill Enterprise allows very considerable customization of the user interface and statistics screens to the point that just about every on-screen item can be modified deleted or replaced, or new items introduced. This ability is allowed within the standard license which should be consulted prior to making any final changes. You can view the Sawmill End User License here. Note that under all circumstances, and for each product, the License requires that you leave the Flowerfire copyright notice untouched and visible together with a visible reference to Sawmill on every page. Please contact support@flowerfire.com if our standard licensing does not meet your need.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: You can use whatever's documented (Regular Expressions), and possibly more. How much more you
can use depends on your platform.
Long Answer
Regular expressions are not fully standardized -- different programs that support "regular expression" may support slightly different features. For instance, some will let you use {N} to repeat the preceding expression N times, and some will not (they will require you to write the expression N times yourself). Some will let you use \d to match any digit, and others will not (they will require you to use [0-9]. So the point of this question is, which of these "non-standard" features does Sawmill support? The answer depends on the platform you're running Sawmill on. Sawmill's regular expressions vary depending on platform -- it uses the built-in regular expression library on some platforms, and the Boost library on others. Anything that is documented in Regular Expressions is available on all platforms. Anything that is not documented there may not be available. The easiest way to find out if something is available is to try it -- add a regular-expression filter to your Log Filters and see if it works. But if you want to make sure your profile is portable, and will work on other platforms, you should stick to the documented choices.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Yes -- the regular expression Dog matches Dog, but not dog or DOG. If you need to match case-insensitively in a log filter, you can convert the field to lowercase first (copy it to another temporary field if you don't want to modify the original), or you can explicitly list upper and lower case values for every letter, e.g. [Dd][Oo][Gg] matches "dog" case-insensitively.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Build the database from the command line with the -v option: Sawmill.exe
-v egblpfd.
-p profilename -a bd
Long Answer
Custom log formats and log filters can be difficult to debug from the graphical interface, because there is little feedback about what Sawmill is doing as it processes the log. Fortunately, Sawmill has a powerful feature called "debugging output" that makes debugging custom log formats and filters much easier. To see the debugging output, you need to use a command-line version of Sawmill. On Windows, that means using the SawmillCL.exe program, and running it from the command prompt. On Unix, you can use the normal Sawmill executable, since it works on the command line. On MacOS, you need to use the MacOS X command-line version of Sawmill. Using the command shell, go to the Sawmill installation directory (using the "cd" command). Then rebuild the database like this: sawmill -p profilename -a bd -v egblpfd | more This command rebuilds the database for the profilename profile, and -v egblpfd tells Sawmill to report a great deal of information about what it's doing (other -v options are available, but egblpfd are the seven options which are most useful for debugging profiles and filters). The results are piped through the "more" program, so you can page through the output using the space bar. Lines starting with "Processing line" show when Sawmill is processing a new log line. Lines starting with "Marking hits" show the end results that are being put into the database. Other lines provide information about log parsing and filtering that can be very useful when you're trying to debug a problem in the parsing of your custom format, or in your custom log filter.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ: Configuring Sawmill to work with Security Enhanced Linux, in CGI mode
Question: Sawmill doesn't work in CGI mode with SELinux enabled; how do I get it to work?
Short Answer: Use semodule to allow the operations that Sawmill uses; see the long answer.
Long Answer
Security Enhanced Linux (SELinux) restricts what programs can do, to prevent them from misbehaving. The default behavior for an unrecognized program blocks certain operations that Sawmill needs to function, resulting in a blank screen when running Sawmill in CGI mode. This article describes how to lower the restrictions to allow Sawmill to work. Start by creating a file called sawmill.te, with the following contents:
module sawmill 1.0; require { class appletalk_socket create; class dir getattr; class dir read; class dir search; class dir { getattr read }; class dir { read search }; class file getattr; class file read; class netlink_route_socket bind; class netlink_route_socket create; class netlink_route_socket getattr; class netlink_route_socket nlmsg_read; class netlink_route_socket read; class netlink_route_socket write; class socket create; class socket ioctl; class udp_socket create; class udp_socket ioctl; class unix_dgram_socket create; role system_r; type apmd_log_t; type autofs_t; type boot_t; type faillog_t; type file_t; type httpd_log_t; type httpd_sys_script_t; type lastlog_t; type mnt_t; type net_conf_t; type proc_net_t;
type type type type type type type type type }; allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow allow
httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t httpd_sys_script_t
apmd_log_t:file getattr; autofs_t:dir getattr; boot_t:dir getattr; faillog_t:file getattr; file_t:dir getattr; httpd_log_t:dir getattr; httpd_log_t:dir read; httpd_log_t:file read; lastlog_t:file getattr; mnt_t:dir getattr; net_conf_t:file getattr; net_conf_t:file read; proc_net_t:dir { read search }; proc_net_t:file getattr; proc_net_t:file read; rpm_log_t:file getattr; samba_log_t:dir getattr; self:appletalk_socket create; self:netlink_route_socket bind; self:netlink_route_socket create; self:netlink_route_socket getattr; self:netlink_route_socket nlmsg_read; self:netlink_route_socket read; self:netlink_route_socket write; self:socket create; self:socket ioctl; self:udp_socket create; self:udp_socket ioctl; self:unix_dgram_socket create; sendmail_log_t:dir getattr; squid_log_t:dir getattr; sysctl_net_t:dir search; sysfs_t:dir getattr; var_log_t:dir read; var_log_t:file getattr; var_t:dir read; wtmp_t:file getattr;
Then run the following commands, as root: checkmodule -M -m -o sawmill.mod sawmill.te semodule_package -o sawmill.pp -m sawmill.mod semodule -i sawmill.pp These commands package up and install a SE module which allows Sawmill to perform all of its operations. Once you have run these commands, Sawmill should function as a CGI program.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Take an existing profile and change the first line to the new name.
Long Answer
You can use an existing profile, keep it's features and create a new one from it by editing the first line: 1. 1. Duplicate the profile CFG file, in the profiles folder of LogAnalysisInfo, changing the name from internal_profile_name.cfg to new_internal_profile_name.cfg. Put this duplicate file somewhere other than the profiles folder, for now; otherwise it will break the Profiles list in the web interface until it has been edited as in step 2. 2. 2. Edit the new file, new_internal_profile_name.cfg with a text editor, to change the first line from internal_profile_name = { to new_internal_profile_name = { 3. 3. Still editing the new file, search for the external label name, and change it to the new external label name; change: label = "Old Profile name in the GUI" to label = "New Profile name in the GUI" 4. 4. Still editing the new file, change the profile_name option to the new internal profile name, change: profile_name = "old_internal_profile_name" to profile_name = "new_internal_profile_name" 5. 5. The very last line of the file contains a comment which should be changed too, for consistency; change: } # old_internal_profile_name to } # new_internal_profile_name 6. 6. Move the new profile into the profiles folder to make it live.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Either recreate it with the new name, or edit the profile .cfg with a text editor, and change the label.
Long Answer
It is not possible to rename the name of a profile through the web interface. You can create a new profile with the desired name, and delete the old one. If the original profile is customized , all customizations will have to be redone in the new profile. To change the name of the profile without recreating it, you can edit the profile .cfg file using a text editor. The file is in LogAnalysisInfo/profiles. Search for "create_profile_wizard_info" in the file, and on the line above it, you will see the label of the profile. The label shows how the profile appears in the web interface, so changing this label line will change the name in the web interface. It does not change the "internal" name, however, which is used from the command line to refer to the profile. If you also need to change the internal name, you will need to rename the profile .cfg file. Do this by changing the filename, e. g., oldprofile.cfg becomes newprofile.cfg. It is CRITICAL that you also change the first line of the .cfg file (using a text editor) to match the filename, without the .cfg extension; so the first line would change from: oldprofile = { to newprofile = { If you do not do this, the Sawmill profile list will give an error, rather than listing the profiles, and other parts of Sawmill may be broken as well. The first line of any .cfg file must always match the filename. Once the filename and first line have been changed, the internal name of the profile will be the new name, and you will also be able to use the new name from command lines. You may also need to manually edit LogAnalysisInfo/schedule.cfg, if you have scheduled tasks which refer to the old name.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
One way to do this is to use a global filter in the statistics, and use "!(hostname within '123.124.125.126')", and this is often the first thing people try, but it's not the best choice. The speed of a statistics filter depends on the number of items checked, so if there are 100,000 IP addresses in your log file, and you check all 100,000, then Sawmill will take up to 100,000 times longer to generate each page. That is probably not what you had in mind. A much better option is to use the Log Filters. Log filters are used to filter out or modify log data as it is being read (rather than filtering database data as it is being browsed, like the statistics filters). You can get to the Log Filters by clicking Show Config in the profiles list, and clicking the Log Filters category. You want to create a filter that will reject any log entries whose hostname field is your IP address. If your IP address is 128.128.128.128, the filter you want is this: if (hostname eq "123.124.125.126") then "reject" The name of the field ("hostname" here) depends on your log data -- use the name that your log data uses. For instance, IIS W3C format calls the field c_ip, so for IIS you would use this: if (c_ip eq "123.124.125.126") then "reject" You can get a list of the fields in your profile by running Sawmill from the command line with "-p profilename -a llf". The next time you rebuild the database, hits from your IP address will be rejected, and will not appear in the statistics. Rejecting all hits from a particular domain is very similar; if your domain is mydomain.com, and your server is set to look up IP addresses, then you can use this filter: if (ends_with(hostname, ".mydomain.com")) then "reject" If your server logs hostnames as IP addresses (and does not resolve them to hostnames with DNS), you can use the subnet for your domain instead; for instance, if all hits from mydomain.com will come from the subnet 128.128.128, then you can use this filter: if (starts_with(hostname, "128.128.128.")) then "reject"
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Use a Log Filter to reject all hits from spiders (and worms).
Long Answer
Create a new Filter to reject all hits from spiders. The easiest way to create log filters is in the Log Filter Editor, in the Log Filters section of the Config. To get to the Log Filters editor, click Show Config in the Profiles list (or click Config in the reports), then click Log Data down the left, then click Log Filters. To create the filter:
q q q q q q q q q q q q
Select the Filter tab Select the filter type: "If a condition is met, then perform an action" Click the New Condition link to create the condition we will test for Select Spider from the drop down list of available fields Select "is NOT equal" from the Operator list Type in "(not a spider)" without the quotes, in to the value field Click OK Click the New Action link Select "Reject log entry" from the drop down list of actions Click OK Click on the Sort Filters tab, and set this filter as the top filter (so that it runs first). You can create a name for this filter (like "Reject Spiders Filter") and a comment (on the Comment tab) as well.
You can also use the Advanced Expression Syntax option from the Filter Type drop down list (on the Filter tab), and type in this filter expression into the value field: if (spider ne "(not a spider)") then "reject"; Then rebuild your database, and all hits from spiders will be discarded. For more details on Filters see Using Log Filters.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes. Add a log filter that rejects hits from all other domains.
Long Answer
Yes. This can be done easily using a log filter. To do this, click Show Config in the profiles list, click Log Filters, and create a new log filter with this value: if (server_domain ne "mydomain.com") then "reject" Replace mydomain.com with the actual domain, and replace server_domain with the name of the log field which reports the server domain in your log data. Sometimes, this field is called cs_host. If there is no such field in your log data, then you'll need to use a different log format in order to filter by domain. The next time you rebuild the database, all log entries from domains other than the one you entered will be rejected, leaving only statistics from the one domain.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Use a Log Filter to reject all hits on that file or directory.
Long Answer
Create a new Log Filter to reject all hits on that file or directory. To do this, click Show Config in the profiles list, click Log Filters, and create a new log filter with this value: if (page eq "/robots.txt") then "reject"; The filter above rejects hits on the /robots.txt file. Or use this: if (starts_with(page, "/somedir/")) then "reject"; The filter above rejects all hits on the /somedir/ directory. The next time you rebuild the database, all hits on that page or directory will be rejected, so they will not appear in the statistics. By the way, the same technique can be use to filter hits based on any field, for instance all hits from a particular host or domain, or all hits from a particular referrer, or all hits from a particular authenticated user.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Create a new log field, database field, report, and report menu item to track and show the category or
custom value, and then use a log filter to set the log field appropriately for each entry.
Long Answer
It is often useful to report information in the reports which is not in the logs, but which can be derived from the information in the logs. For instnace, it is useful to see events in categories other than those which naturally fall out of the data. Natural categories for web logs include page directories (the page field), months (the date/time field), or visitor domains (the hostname field). Similarly, it is useful to derive related values from the log fields values, and report them as though they were in the log data; for instance, if you have a username, you may want to report the full name, organization, and other information about the username. Sawmill treats every value of every field as a category, so you can categorize by any field in your log data. You can take advantage of this feature to make your own categories, even if those categories are not immediately clear in the log data. Categories like these are called "custom fields.". One common use of custom fields is to separate internal hits (hits from you) from external hits (hits from other people). Another use is to separate monitoring hits (hits from programs you use to monitor your own site) from actual hits (hits by browsing people). Another similar categorization is spider hits (hits from search engine robots and other robots) vs. human hits (hits by browsing people). Custom fields are also used to show metadata associated with a particular item, for instance to show whois information from an IP address, full name from a username, and other information. Sawmill does some common custom fields for you (geographic location derived from IP, hostname derived from IP, web browser derived from user-agent, and many more), but if you need to derive your own custom field, Sawmill also provides you with the "hooks" you need to do it. There are five steps to this (described in detail below): 1. 2. 3. 4. 5. Step 1: Create a log field Step 2: Create a database field based on that log field Step 3: Create a report based on that database field Step 4: Create a report menu item for that report Step 5: Create a log filter to populate the log field
data_type = "int" display_format_type = "integer" } # 1 2 = { header_label = "%7B=capitalize(database.fields.page_views.label)=}" type = "number" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "page_views" data_type = "int" display_format_type = "integer" } # 2 3 = { header_label = "%7B=capitalize(database.fields.visitors.label)=}" type = "number" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "visitors" data_type = "unique" display_format_type = "integer" } # 3 4 = { header_label = "%7B=capitalize(database.fields.size.label)=}" type = "number" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "size" data_type = "float" display_format_type = "bandwidth" } # 4 } # columns } # my_category } # report_elements label = "My Category" } # my_category
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Delete the database.fields entry from the profile .cfg file, and delete any xref groups and reports that use it.
Long Answer
Deleting database fields reduces the size of the database, and reduces the time required to build the database. Here's how you can delete a database field: 1. Using a text editor, edit the .cfg file for your profile, in LogAnalysisInfo/profiles. 2. Search for "database = {" and then search forward from there for "fields = {" to find the database fields section. Comment out the field you don't want (or delete it). For instance, to remove the screen_dimensions field, change this section: screen_dimensions = { label = "DOLLARlang_stats.field_labels.screen_dimensions" type = "string" log_field = "screen_dimensions" suppress_top = "0" suppress_bottom = "2" always_include_leaves = "false" } # screen_dimensions to this: # # # # # # # # screen_dimensions = { label = "DOLLARlang_stats.field_labels.screen_dimensions" type = "string" log_field = "screen_dimensions" suppress_top = "0" suppress_bottom = "2" always_include_leaves = "false" } # screen_dimensions
3. Now that the database field is gone, you will still need to remove any references to the field from other places in the profile. Typically, there is an xref group for this field, so this needs to be removed as well. Search from the top for cross_reference_groups, and comment out the group associated with the field or delete it. For instance, for screen_dimensions field, change this section: screen_dimensions = { date_time = "" screen_dimensions = "" hits = "" page_views = "" } # screen_dimensions to this: # # screen_dimensions = { date_time = ""
# # # #
4. By default, there will also be a report for the field, which has to be removed. Search for "reports = {", then search forward for the appropriate report name, which is the same as the database field name. Comment it out or delete it. For instance, search for "screen_dimensions = {", and then comment it out, replacing this: screen_dimensions = { report_elements = { screen_dimensions = { label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}" type = "table" database_field_name = "screen_dimensions" sort_by = "hits" sort_direction = "descending" show_omitted_items_row = "true" omit_parenthesized_items = "true" show_totals_row = "true" starting_row = "1" ending_row = "10" only_bottom_level_items = "false" columns = { 0 = { type = "string" visible = "true" field_name = "screen_dimensions" data_type = "string" header_label = "%7B=capitalize(database.fields.screen_dimensions.label)=}" display_format_type = "string" main_column = "true" } # 0 1 = { header_label = "%7B=capitalize(database.fields.hits.label)=}" type = "number" show_number_column = "true" show_percent_column = "true" show_bar_column = "true" visible = "true" field_name = "hits" data_type = "int" display_format_type = "integer" } # 1 2 = { header_label = "%7B=capitalize(database.fields.page_views.label)=}" type = "number" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "page_views" data_type = "int" display_format_type = "integer" } # 2 } # columns } # screen_dimensions } # report_elements label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}" } # screen_dimensions to this:
# # # # =}" # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
screen_dimensions = { report_elements = { screen_dimensions = { label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label))) type = "table" database_field_name = "screen_dimensions" sort_by = "hits" sort_direction = "descending" show_omitted_items_row = "true" omit_parenthesized_items = "true" show_totals_row = "true" starting_row = "1" ending_row = "10" only_bottom_level_items = "false" columns = { 0 = { type = "string" visible = "true" field_name = "screen_dimensions" data_type = "string" header_label = "%7B=capitalize(database.fields.screen_dimensions.label)=}" display_format_type = "string" main_column = "true" } # 0 1 = { header_label = "%7B=capitalize(database.fields.hits.label)=}" type = "number" show_number_column = "true" show_percent_column = "true" show_bar_column = "true" visible = "true" field_name = "hits" data_type = "int" display_format_type = "integer" } # 1 2 = { header_label = "%7B=capitalize(database.fields.page_views.label)=}" type = "number" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "page_views" data_type = "int" display_format_type = "integer" } # 2 } # columns } # screen_dimensions } # report_elements label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}" } # screen_dimensions
5. Now you need to remove the report element from the single_page_summary report. Search for single_page_summary, then search forward for the field name (e.g., search for "screen_dimensions = {"). Again, comment out the whole report element or delete it, replacing this: screen_dimensions = { label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}" type = "table" database_field_name = "screen_dimensions" sort_by = "hits" sort_direction = "descending"
show_omitted_items_row = "true" omit_parenthesized_items = "true" show_totals_row = "true" starting_row = "1" ending_row = "10" only_bottom_level_items = "false" columns = { 0 = { type = "string" visible = "true" field_name = "screen_dimensions" data_type = "string" header_label = "%7B=capitalize(database.fields.screen_dimensions.label)=}" display_format_type = "string" main_column = "true" } # 0 1 = { header_label = "%7B=capitalize(database.fields.hits.label)=}" type = "number" show_number_column = "true" show_percent_column = "true" show_bar_column = "true" visible = "true" field_name = "hits" data_type = "int" display_format_type = "integer" } # 1 2 = { header_label = "%7B=capitalize(database.fields.page_views.label)=}" type = "number" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "page_views" data_type = "int" display_format_type = "integer" } # 2 } # columns } # screen_dimensions with this: # # =}" # # # # # # # # # # # # # # # # # screen_dimensions = { label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label))) type = "table" database_field_name = "screen_dimensions" sort_by = "hits" sort_direction = "descending" show_omitted_items_row = "true" omit_parenthesized_items = "true" show_totals_row = "true" starting_row = "1" ending_row = "10" only_bottom_level_items = "false" columns = { 0 = { type = "string" visible = "true" field_name = "screen_dimensions" data_type = "string" header_label = "%7B=capitalize(database.fields.screen_dimensions.label)=}"
# # # # # # # # # # # # # # # # # # # # # # # # # # # #
display_format_type = "string" main_column = "true" } # 0 1 = { header_label = "%7B=capitalize(database.fields.hits.label)=}" type = "number" show_number_column = "true" show_percent_column = "true" show_bar_column = "true" visible = "true" field_name = "hits" data_type = "int" display_format_type = "integer" } # 1 2 = { header_label = "%7B=capitalize(database.fields.page_views.label)=}" type = "number" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "page_views" data_type = "int" display_format_type = "integer" } # 2 } # columns } # screen_dimensions } # report_elements
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: These are "internal referrers"; they represent visitors going from one page of your site to another page of
your site. You can eliminate them by modifying the default "(internal referrer)" log filter, changing http://www.mydomain.com/ in that filter to your web site URL.
Long Answer
Referrers show which page a hit came from -- i.e. they show what page a visitor was on when they clicked the link that took them to your page. For most web sites, visitors arrive and then click through several pages before leaving, so most web log data has a lot of referrers that are pages on the site being analyzed. For instance, if someone visits http://www.yoursite.com/ index.html, and then clicks on a link pointing to http://www.yoursite.com/page2.html, it will show up in the log data (and in the statistics) as a referrer http://www.yoursite.com/index.html. These referrers are called an "internal referrer," and under normal circumstances, you don't really care about them-- what you really want to know is which referrers brought traffic to your site, not what the referrers were once they got there. Sawmill can't distinguish internal referrers from external referrers because it doesn't know your site's URL. So it doesn't know if a referral from http://www.yoursite.com/index.html is internal (which it is if your site is yoursite.com), or external (which it is if your site is anything else). To help Sawmill identify and hide internal referrers, you need to modify a log filter that Sawmill creates for you. Here's how: 1. 2. 3. 4. 5. Go to the Config section of your profile. Click Log Filters. Edit the log filter which sets referrers from "yoursite.com" to "(internal referrer)" Replace "yoursite.com" with your actual site name, in that log filter. Rebuild the database.
Once you've done that, the internal referrers will be suppressed in the "Top referrers" view (or they will appear as "(internal referrer)" if you've turned on parenthesized items).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Delete the Log Filter that converts the parameters to "(parameters)."
Long Answer
By default, Sawmill creates a log filter to convert everything after the ? in the page field to "(parameters)". In most cases that's best, because it reduces the size of the database significantly. But if you need the parameter information, it's easy to get it back--just delete that filter. You can do that like this: 1. Go to the Config section of your profile. 2. Click Log Filters. 3. If your log format is Apache or similar, find the log filter which replaces everything after "?" with "(parameters)", and delete or disable that log filter. 4. If your log format is IIS or similar, find the log filter which appends the cs_uri_query field to the cs_uri_stem field, and enable that log filter. 5. Rebuild the database. When you view the reports, you'll see that "(parameters)" has now been replaced by actual parameters.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Use the Calendar, or the Filters, or use a recentdays filter on the command line.
Long Answer
In the reports, you can go to the Calendar view and click on a recent day, week, or month to see the statistics for that time period. You can also edit the global filters to zoom in on any collection of months or days, including the most recent ones. However, filters made in that manner will not move forward as the date changes. If you want a statistics filter that will always show the most recent seven days, automatically, then you will need to use the command line, or edit the profile file manually. Sawmill's command-line filtering options are slightly more powerful than the filtering options available from the web interface. Though it's not possible in the web interface to create a filter which always shows the last seven days, it is possible to do this from the command line, using a recentdays:N filter on the date/time field. For instance, to send email showing the past seven days of data, use a command line this: Sawmill.exe -rfcf -cm svbe -f "recentdays:7"
It is also possible to use this kind of a filter in a profile file, by editing the file manually. So for instance, if you want to use a recentdays filter on a particular report or report element always shows the most recent seven days of data, you can edit the profile file (in the profiles folder of LogAnalysisInfo) to change the "filters" value within the report or report_element node to recentdays:7 (create a note called "filters" if one does not already exist).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Create a log filter converting all the hostnames to the same hostname.
Long Answer
You can do this by converting all of the hostnames to a single hostname, so for instance they all appear as http://yahoo.com referrers. To do this, you need to convert all occurrences of /search.yahoo.com/, /dir.yahoo.com/, or /www.yahoo.com/ into / yahoo.com/, in the referrer field. The easiest way is to make three log filters, in the Log Filters section of the Config part of your profile: referrer = replace_all(referrer, "/search.yahoo.com/", "/yahoo.com/") referrer = replace_all(referrer, "/dir.yahoo.com/", "/yahoo.com/") referrer = replace_all(referrer, "/www.yahoo.com/", "/yahoo.com/") Then rebuild the database; the resulting statistics will combine all three referrers in a single /yahoo.com/ referrer. A more sophisticated filter is necessary if you need to preserve some parts of the URL while converting others. In that case, you can use a regular expression filter: if (matches_regexp(referrer, "^http://us\.f[0-9]*\.mail\.yahoo\.com/ym/(.*)")) then referrer = "http://us.f*.mail.yahoo. com/1" The way this works is it matches any referrer starting with http://us.fN.mail.yahoo.com/ym/ (where N is any integer), and while it's matching, it extracts everything after the /ym/ into the variable 1. The leading ^ ensures that the referrer starts with http://, the trailing ensures that the parenthesized .* section contains all of the remainder after /ym/, [0-9]* matches any integer, and \. matches a single period (see Regular Expressions for more information about regular expressions). If it matches, it sets the referrer field to http://us.f*.mail.yahoo.com/1, where 1 is the value extracted from the original URL. This allows you to collapse all http://us.fN.mail.yahoo.com/ URLs into a single one without losing the extra data beyond /ym/. If you don't care about the data beyond /ym/, you can use somewhat simpler (or at least easier-to-understand) filter: if (matches_wildcard(referrer, "^http://us.f*.mail.yahoo.com/ym/*")) then referrer = "http://us.f*.mail.yahoo.com/ym/" This one uses a wildcard comparison (if matches wildcard expression) rather than a regular expression, which allows the use of * in the expression in its more generally used meaning of "match anything". Note also that in the first line, * appears twice and each time matches anything, but in the second line it appears only once, and is a literal *, not a "match-anything" character.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Turn on reverse DNS lookup in the Network options (or in your web server), or use Sawmill's "look up IP
numbers using DNS" feature.
Long Answer
Your web server is tracking the IP numbers of visitors, but not their hostnames or domains. If you need hostname or domain information, you need to tell Sawmill (or your web server) to look up the IP addresses using DNS (domain name service). One way to do this is to turn on DNS lookup in your web server; that will slow down your server, but then Sawmill will report hostnames and domains without any performance penalty during log data processing. If you're not willing to take the performance hit on your server, or if you want to analyze log data that has already been generated with IP addresses, you can turn on Sawmill's reverse DNS feature like this: 1. Log in to Sawmill as an administrator. 2. Click "View Profile in Config" once you're in the profile you want to modify. 3. Select "More Options" on the top right menu. 4. Scroll down to "Miscellaneous". 5. Check the box labeled "Look up IP numbers using domain nameserver (DNS)". 6. Enter the hostnames or IP addresses of one or two DNS servers in the DNS server fields. You can get this information from your network administrator, or your ISP. 7. Click "Save Changes". 8. Rebuild the database (e.g. click "Database Info" and then "Rebuild Database" at the top). Processing log data will be slower with reverse DNS turned on, but you will get full hostname and domain information. If you have problems getting the DNS feature to resolve IP addresses, see Problems With DNS Lookup. A third option is to use a separate DNS resolving program to compute your log files after the server is done writing them, and before Sawmill analyzes them. Examples include logresolve, which is included with the popular Apache web server, DNSTran, which runs on several platforms including Macintosh, Linux, Solaris, and IRIX. If you're using UNIX or MacOS X, another good option is adns, an asynchronous DNS lookup library that includes some command-line tools for looking up IP addresses, including adnslogres (for Common Access format and Apache Combined format files) and adnsresfilter (for other types of log files). For instance, you can use the command "adnsresfilter < /path/ to/my/log.file" as your log source command to use adns. adns is faster than logresolve, but more difficult to configure initially.
You can plug any command-line DNS resolver directly into Sawmill by and entering a UNIX command, surrounded by backtick characters (`), that resolves the IPs in the log file and dumps the resolved log data to the standard output stream, in this case `logresolve < /path/to/my/log.file` Once you've done that, Sawmill will automatically run logresolve when you process your log data, and it will resolve the data before feeding it to Sawmill.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes -- just edit the search_engines.cfg file in the LogAnalysisInfo folder with a text editor.
Long Answer
Yes; Sawmill's search engine recognition mechanism is easily extensible. All the search engines Sawmill knows are described in a text file called search_engines.cfg, which is found in the LogAnalysisInfo folder of your Sawmill installation. Sawmill puts several dozen search engines in there to begin with (the big, well-known ones), but you can add as many more as you like, by editing the file with a text editor. Just add a new line for each new search engine, and the next time Sawmill processes log data, it will recognize those search engines, and it will include them in the database. The "name" value for a search engine name of the search engine; put whatever you want the search engine to be called there. That's what will appear in the statistics. The "substring" value is a "quick check" that Sawmill uses to check if a URL might be a URL from that search engine. If the URL contains the "quick check" string, Sawmill then does a slower check using the "regexp" column, which is a regular expression. If the regular expression matches, Sawmill uses the parenthesized section of the regular expression as the search terms (it should be a series of search terms, separated by plusses (+)). The parenthesized section is used to compute the search terms and search phrases statistics. You might notice that the "substring" column is redundant -- Sawmill doesn't really need it at all, since it could just check every URL with the regular expression. The reason that second column is there is that regular expressions are relatively slow -Sawmill can process log data much faster if it doesn't have to check every URL in the log data against dozens of regular expressions. This way, it only has to use the regular expressions on a tiny proportion of the URLs that it sees.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Sawmill reports times exactly as they appear in the log data -- if the time shows up as 8:00 AM in the log data, that hit will appear as 8:00 AM in the statistics. Since servers sometimes log in GMT, or some other time zone from where Sawmill is running, you may want to offset the times in your statistics to match your own time zone, rather than the server's time zone or GMT. This is easily done using the date_offset option in the profile file (the profile file is in the profiles folder of LogAnalysisInfo). The number of hours specified in that option is added to the date/time, so if it's a negative number, it moves times backwards, and if it's positive, it moves them forwards. For instance, if you're 8 hours behind GMT (GMT-0800), and your server logs in GMT, you can set this value to -8 to get statistics in your own time zone. This option affects log entries are they are processed, so you'll need to rebuild the database after setting this option, to see the changes in the statistics.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Hits are accesses to the server; page views are accesses to HTML pages; visitors are unique visitors to
the site, and sessions are visits to the site.
Long Answer
Sawmill can count web log traffic in several ways. Each way is counted independently of the others, and each has its own advantages in analyzing your traffic. The different types are:
q
Hits. Hits are accepted log entries. So if there are 5000 entries in your log file, and there are no log filters, and all the entries are valid (i.e. none of them have corrupt dates), then Sawmill will report 5000 hits for the file. If there are log filters that reject certain log entries, then those will not appear as hits. Log entries that are accepted by the log filters will count toward the hits totals. Because there are no default filters that reject, you will generally have nearly as many reported hits as you have log entries. You can view and edit the log filters by Opening your profile from the Administrative Menu, clicking Profile Options, and then clicking the Log Filters tab. See also Using Log Filters. Page views. Page views correspond to hits on pages. For instance, if you're analyzing a web log, and a hit on /index. html is followed by 100 hits on 100 images, style sheets, and JavaScript files, that appear in that page, then it will count as a single page view -- the secondary files do not add to the total. This is implemented in the log filters -- page views are defined as log entries that are accepted by the log filters, and that have a page_view value set to 1 by the log filters. Log entries that are accepted by the filters, but have page_view of 0 set by the filters do not contribute to the page views total. Therefore, you have complete control over which files are "real" page views and which are not -- if Sawmill's default filters do not capture your preferred definition of page views, you can edit them until they do. By default, page views are all hits that are not GIF, JPEG, PNG, CCS, JS, and a few others. See Hits, above, for more information on log filters. Visitors. Visitors correspond roughly to the total number of people who visited the site. If a single person visits the site and looks at 100 pages, that will count as 100 page views, but only one visitor. By default, Sawmill defines visitors to be "unique hosts" -- a hit is assumed to come from a different visitor if it comes from a different hostname. This can be inaccurate due to the effects of web caches and proxies. Some servers can track visitors using cookies, and if your web logs contain this information, Sawmill can use it instead of hostnames -- just change the log_field value for the visitors database field to point to the cookie field, rather than the hostname field. Bandwidth. Bandwidth is the total number of bytes transferred. It is available only in log formats that track bytes transferred. Bandwidth is tracked for every log entry that is accepted, whether it is accepted "as a hit" or "as a page view". For log formats which track both inbound and outbound bandwidth, Sawmill can report both simultaneously. Sessions. Several of Sawmill's reports deal with "session" information, including the "sessions overview" and the "paths (clickstreams)" report. Sessions are similar to visitors, except that they can "time out." When a visitor visits the site, and then leaves, and comes back later, it will count as two sessions, even though it's only one visitor. To reduce the effect of caches that look like very long sessions, Sawmill also discards sessions longer than a specified time. The timeout interval is also customizable. Session events. A page view which occurs during a session is a session event. For web server logs, this number is similar to page views, but may be smaller, because it does not include page views which are not in any session. That can occur if the page view is a reload (two consecutive hits on the same page), or if the page view is a part of a session which has been discarded because it is too long.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes, but you need to delete the "(parameters)" log filter first.
Long Answer
Sawmill can handle URLs/pages in any format, but by default it strips off the parameters (the part after the question mark) to save space in the database. most people don't need the parameters, but if you have a dynamic web site, you do. to see the parameters, so this: 1. 2. 3. 4. 5. Go to the Config section of your profile. Click Log Filters. Find the Log Filter which replaces everything after "?" with "(parameters)". Delete that log filter. Rebuild the database.
Now, when you look at the "Pages" or "Pages/directories" view, you should see your complete URLs, along with the parameters. If you want to take it a step further, you can also set up log filters to extract certain sections of your URLs, and put them in custom fields, to make your statistics more readable. For instance, if you have a store with several items in it, you can create an "items" field, with an associated "Top items" view, and you can set up a log filter to extract the item number (or name) from the URL and put it in the "items" field. Or you can even set up a filter to extract the item numbers from your URLs, convert them to the actual name of the item, stick them in the "item" field, and report them in the "top items" view. This is an example of a "custom field" -- see Creating Custom Fields for information on how to create one.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: It means that some items (probably useless ones) have been omitted from the table to make the
information more useful--you can show them by choosing "show parenthesized items" from the Options menu.
Long Answer
Sawmill omits parenthesized items (i.e. any item that starts with "(" and ends with ")" from some tables to make the information more useful. For instance, most hits on a web site do not come directly from a search engine, (some come from links in other pages on the site, and others come from links on web sites that are not search engines), so usually the largest item in the search engines table would be on the item called "(no search engine)." Because hits from non-search-engines are not important in the search engines table, and because they dominate the numbers, making it difficult to compare "real" search engines, this item is omitted by default from the table. The way Sawmill omits it is by omitting all parenthesized items. Other examples of parenthesized items include the "(no search terms)" item in the search terms table, and the "(internal referrer)" item in the referrers table. If you want to see all the hits in these tables, you can turn on parenthesized items in the Table Options page.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: /somedir/ is the total hits on a directory and all its contents; /somedir is an attempt to hit that directory
which was directed because it did not have the trailing slash; and the default page ones both indicate the number of hits on the directory itself (e.g., on the default page of the directory).
Long Answer
To understand why there are hits shown on both /somedir/ and /somedir, where "somedir" is the name of a directory (folder) in the web site, it is necessary to understand what happens when there is a browser that tries to access http://hostname/ somedir . That URL is incorrect (or at best, inefficient), because it lacks the trailing divider, which implies that somedir is a file. Here's what happens in this case: 1. The web browser asks for a file named /somedir . 2. The server checks, and finds that there is no file by that name (because it's a directory). It responds with a 302 redirect to /somedir/, which basically means, "no such file, but there is a directory; maybe that's what you meant?" 3. The browser accepts the redirect, so now it requests a directory named /somedir/ 4. The server notes that there is a directory by that name, and that it contains an index or default file. It responds with a 200 event, and the contents of the index file. This looks like this in the web logs: server_response=302, page=/somedir server_response=200, page=/somedir/ Sawmill reports this as two hits, because it is two hits (two lines of log data). Sawmill differentiates the aggregate traffic within a directory from traffic which directly hits a directory, by using /somedir/ to represent aggregation of traffic in the directory, and using /somedir/{default} to represent hits on the directory itself (i.e., hits which resulted in the display of the default page, e.g., index.html or default.asp). So in reports, the second hit above appears as a hit on /somedir/{default}, which appears in HTML reports as a hit on "/somedir/ (default page)". A good solution to this is to make sure that all links refer to directories with the trailing slash; otherwise the server and browser have to do the elaborate dance above, which slows everything down and doubles the stats. Another option is to reject all hits where server response starts with 3, using a log filter like this one: if (starts_with(server_response, '3')) then 'reject' This discards the first hit of the two, leaving only the "real" (corrected) one. In summary, hits on /somedir/ in reports represent the total number of hits on a directory, including hits on the index page of the directory, any other files in that directory, and any other files in any subdirectory of that directory, etc. Hits on /somedir in reports represent the 302 redirects caused by URLs which lack the final /. Hits on /somedir/{default} or /somedir/(default page) represent hits on the default page of the directory.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Select PDF from the 'File Types' table and then use the Zoom Menu to Zoom to the URL's report, then
Select the PDF you need to get an overview of that file.
Long Answer
Click on the 'Content' report group from the left hand menu, then click on the 'File Types' report. When the File Types report loads, click on 'PDF' from the table and the table will re-load with just a PDF entry and a menu will appear above the table with a list of all tables in it. From that drop down (The Zoom To Menu) select the 'Pages' or 'URL's' (it could be either) option and you should then see a page load that has only pages/URL's in where the file type is PDF. You can then select the PDF from that list, and you would next see an Overview for that file only. This type of filtering uses the Zoom Filters, they are temporary filters that are applied on the report(s) as you click about (Zoom about) the report. By clicking on any item from the left hand menu they are cancelled and you are returned to that reports default view where there are no filters set (unless the default has a filter set via the Report Editor, in which case that filter set will be applied). If you want to filter items in the report, have it apply to the whole report and be able to turn on the filter when you need to, it is better to use the Global Filters that are available from the Filter Icon in the Toolbar (just above the report). These can be created and enabled and disabled as you need them, and you only need to create them once and they are stored under your username and the profile you are using for use next time you need them, Zoom filters are not stored anywhere and need reapplying each time you need the filter set.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Increase the "suppress below" level for this database field in the profile options.
Long Answer
Sawmill limits the number of levels you see by default to save memory, disk space, and time. You can increase the levels on any database field like this: 1. Using a text editor, open the .cfg file for your profile, in the LogAnalysisInfo/profiles folder. 2. Find the
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes, by using the Calendar, and/or creating a database field and a report tracking "weeks of the year."
Long Answer
The date/time field in Sawmill tracks years, months, days, hours, minutes, and seconds. Each of these units fits evenly into the larger unit (24 hours in a day, 12 months in a year, etc.). Because weeks do not fit evenly into months, Sawmill cannot easily fit weeks into the date/time hierarchy. Still, there are several ways to see weekly statistics. One way is to use the Calendar. In the Calendar, each week is represented as a link called "week"-- clicking the link applies a filter to the date/time field that shows the hits on those seven days. This lets you zoom in on a particular week, so you can see the statistics for that week, or you can switch to other views to learn more about the activity for that week. However, if you do it that way you can't see a list or graph of weeks, with the hits for each week, the way you can for days in the "Days" report. If you need a weekly graph or table, you need to track the "week of the year" log field in your database. The week of the year is a number between 1 and 52 that represents the week of the year (e.g. 1 means January 1 through January 8, etc.). You can track the week of the year field like this: 1. Open the profile file (Sawmill/LogAnalysisinfo/profiles/profilename .cfg) you want to add week_of_year reports to in your favorite text editor (notepad). 2. Search for "database = {", then search for "fields = {" and scroll down until you see "day_of_week = {" 3. Copy this line and all lines until the line "} # day_of_week" and paste it all just underneath. 4. Where you see day_of_week in the new section change it to week_of_year (except use "string" where you see "display_format_type"), so it becomes: day_of_week = { label = "day of week" type = "string" log_field = "day_of_week" display_format_type = "day_of_week" suppress_top = "0" suppress_bottom = "2" always_include_leaves = "false" } # day_of_week week_of_year = { label = "week of year" type = "string" log_field = "week_of_year" display_format_type = "string" suppress_top = "0" suppress_bottom = "2" always_include_leaves = "false" } # week_of_year
5. Then search for "reports = {" and duplicate (by copy/paste as above) an existing report (the Day of week report is a good choice), and again where you see day_of_week in the new section change it to week_of_year (except use
"string" where you see "display_format_type"). 6. Then search for "reports_menu = {" and then "date_time_group = {" and duplicate (by copy/paste as above) an existing report menu (the Day of week report is a good choice), and again where you see day_of_week in the new section change it to week_of_year (except use "string" where you see "display_format_type"). 7. Save the changes you have made . 8. Rebuild the database. The new report will show you traffic for each week of the year.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Yes; Sawmill can tell you the number of unique visitors for any item in the database, including the number of visitors for a particular day, the number of visitors from a particular domain, the number of visitors who hit any particular page or directory, or any other type of data Sawmill can display. By default, Sawmill uses the hostname field of your log data to compute visitors based on unique hosts. That works for all log files, but it's a somewhat inaccurate count due to the effect of proxies and caches. If your log data tracks visitors using cookies, you can easily configure Sawmill to use the cookie information instead, by changing the "visitors" database field so it is based on the cookie log field instead (in the Log Filters section of the profile Config). See also Counting Visitors With Cookies.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes -- it includes a built-in log format to do this for Apache, and other servers can be set up manually.
Long Answer
Yes. The reason you'd want to do this is that using unique browsing hostnames (or IPs) to count visitors is an imprecise method, since the same actual visitor may appear to come from several hostnames -- the same person may dial up and receive random IP addresses, or in some extreme cases, their ISP may be set up so that they have a different IP address for each hit, or several actual visitors may appear as one hostname if they're all using the same proxy. The solution to this problem is to set your web server to use cookies to keep track of visitors. Apache and IIS can be configured to do this, and in both cases, Sawmill can be configured to use the cookie log field, instead of the hostname, as the basis for its "visitor" field. To do this, edit your profile (in LogAnalysisInfo/profiles) with a text editor, find the "visitors" database field (look for "database = {", then "fields = {", then "visitors = {"), and change the log_field value to your cookie field; for instance, if your cookie field is cs_cookie, change it to log_field = "cs_cookie". Note that this will only work if your entire cookie field tracks the visitor cookie, and does not track any other cookes; if you have multiple cookies, you can't use the whole cookie field as your visitor ID, and you need to use the approach described below to create a visitor_id field and use a regular expression to extract your visitor cookie into it, and then change log_field to visitor_id.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Edit the profile .cfg, and change sessions_visitor_id_field to the username field.
Long Answer
Sawmill calls this the "session user" field, or the "session visitor ID" field. This is the field which differentiates users; if the value in this field is different, for two events, Sawmill assumes that those events are from two different users, and therefore are not part of the same session. By default, Sawmill uses the "client IP" field (or "hostname", or "source IP", or others, depending on the log format) to differentiate users. But if you have username information in your logs, it is sometimes better to use the username to differentiate sessions, because it better identifies an individual, especially in environments where individuals may use multiple IP addresses. To do this, edit the profile .cfg file, which is in the LogAnalysisInfo/profiles folder, using a text editor. Search for this line (its full location is log.field_options.sessions_visitor_id_field): sessions_visitor_id_field = "hostname" and change "hostname" to "username" (or "cs_username", or "x_username", or "user", or whatever the field is called in your log data; you can see a list of field names by running Sawmill from the command line with "sawmill -p {profilename} -a ldf"). For example change it to this, if your username field is called "username": sessions_visitor_id_field = "username" Then, rebuild the database (or delete the LogAnalysisInfo/ReportCache folder), and view a session report, and Sawmill will recompute your session reports using the user field.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes; its "session paths (clickstreams)" report is very powerful.
Long Answer
Yes, very well. Most statistics packages will only show you the "top paths" or maybe the entry and exit pages; Sawmill shows you all the paths visitors took through the sites, in an easily navigated hierarchical report. You get complete data about every path that every visitor took through your site, click-by-click. For even more detail, you can zoom in on a particular session in the "individual sessions" report, to see the full log data of each click in the session.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes -- encode source information in your URLs and use global filters to show the top entry pages for your
"success" page.
Long Answer
If you advertise your web site, one of the most useful pieces of information you can get from Sawmill is information on "conversions"; i.e. how effective your ads are at actually generating sales, sign-ups, or whatever it is that makes your site a success. Sawmill can provide highly detailed conversion information with a little effort. Here's how you do it: 1. Make sure that every URL leading to your site is tagged with information that tells you where it came from. E.g. for an Overture keyword "red umbrellas" use http://www.mysite.com/?source=overture&keyword=red+umbrellas. Do the same for all ads. This is a good idea anyway (and Overture recommends it), but it's essential if you want to track conversions in Sawmill. Do this for every link leading to your site. Obviously, you can't do this for URLs you have no control over (like Google searches), but you can do it for all your ads, which are the important ones from a conversion perspective. 2. Wait for some traffic to arrive with the parameterized URLs. 3. Remove the "page parameters" log filter, in the Log Filters section of the profile Config, so Sawmill will track page parameters (see http://sawmill.net/cgi-bin/sawmilldocs?ho+faq-pageparameters). 4. View the statistics. 5. Go to the "Entry pages" view in your statistics. You should see all your full URLs there, with percentages if you want, which will tell you how much traffic each ad brought to your site. For instance, if you see that you got 1000 entries to the http://www.mysite.com/?source=overture&keyword=red+umbrellas page, then you know that your Overture ad for "red umbrellas" brought 1000 hits. That's useful information, but not conversion information--that comes next. 6. Edit the global filters in the reports, and set the filters to show only sessions that went through your "success" page. This is the page that people see after they've done whatever you wanted them to do. For instance, if success for you means a sale, then this would be the "thank you for your order" page. If success means that they sign up, this is the "you have signed up" page. If success means that they submitted a feedback form, this is the "thanks for your feedback page." 7. Now you're looking at the "Entry pages" view, but it's been filtered to show only those sessions which eventually "converted". This is exactly what you want to know -- if you see 100 entrances at http://www.mysite.com/? source=overture&keyword=red+umbrellas , then you know that 100 visitors found your site from your "red umbrellas" ad on Overture, and eventually hit your success page later in the same session. This is pure marketing gold -- by comparing the total cost of the ad (e.g. if each click is 0.10, and there were 1000 total clicks, then you spent 100.00 on that keyword), with the total payout of the ad (e.g. if each "success" is worth 5.00 in currency, then you know you made 500.00 from the 100 successful "red umbrellas" clicks), you can tell whether the ad is worth it. In this example, you paid $100 for the ad and got 500.00 in sales from it -- keep that one running!
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Click on the item you're interested in, and chose the other field from "default report on zoom".
Long Answer
Sawmill can answer this sort of question for any combination of fields. All you need to do is use the zoom filters (or global filters) to zoom in on the item you want specific information for, and then use "default report on zoom" to switch to the report that shows the data you want. For instance, if you want to know the top search engines for a particular search phrase, click Search phrases, then click a particular search phrase, and then choose "Search engines" from the "Default report on zoom" menu. The resulting table will show you a breakdown by search engine of the hits for the search phrase you selected.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Use the global filters to show only sessions containing that page; reports will only show sessions
including that page.
Long Answer
In the global filters, add a filter to show only sessions containing that page. Then return to the reports; until you remove that global filter, all reports will show only traffic for sessions containing a particular page.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Direct that search engine to a particular entry page, and then use global filters to show only sessions for
that page.
Long Answer
Some information of this type is available in the "Search engines" view -- you can zoom in on a particular search engine by clicking its name there, and then switch to the top visitor hostnames view to see which hosts came from that search engine, and other information about the traffic from that search engine. But that only works for the first click, because after that, the log data no longer lists the originating search engine (the referrers are internal from that point on). So you can see how much traffic search engines brought, but what if you want to see what the visitors from a particular search engine did after they came to the site? You can do that by using custom entrance pages and Global Filters. Start by pointing each search engine to its own URL, where possible. For instance, instead of pointing Overture to http://www.mysite.com/index.html, you can point it to http://www. mysite.com/index.html?source=overture Once you've done that, then all traffic from Overture will initially arrive at the /index. html?source=overture page. By showing only sessions containing that page (see Sessions For A Particular Page), you can show the session activity of Overture visitors, including what paths they took, how long they stayed, and more. You can do the same thing for any search engine, advertising campaign, or link exchange that allows you to choose your URL. It won't work quite as easily for broad search engines like Google, which let people enter your site at any point, but it's still possible to "tag" the URL similarly using a log filter -- see Tracking Conversions.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Session information only shows users contributing page views, and other views show all visitors. Also,
long sessions are discarded from the session information.
Long Answer
The configuration database is split into two major sections: the main statistics, and the session information. The main statistics contains information on all hits; the session information shows the "sessions" -- i.e. it tracks the sequence of page views of each person who visits the site. Most views show the main statistics; only the session-related views (Sessions (summary), Sessions, Paths (clickstreams), Paths through a page, Entry pages, Exit pages, Paths through a page, Session pages, etc.) show the session information. Because these two types of data are computed differently, the numbers may vary between the two. There are two major factors that affect the session users, but do not affect the visitors. First, session information is based on page views only, while visitor information is computed based on all hits in the database. So for instance, if the web site is accessed by a browser that fetches only a single image file, and never hits a page, that hit (and that host) will appear in the main statistics, but not in the session statistics. To put it another way, the visitors are the number of unique hosts who contributed hits; the session users are the number of unique hosts contributing page views. If your database is set up to track hits or bandwidth, these numbers may be significantly different. if your database tracks only page views, then visitor information will also be based on page views, and visitors and session users will be closer. The second factor is that long sessions are discarded from the session information. By default, sessions longer than 2 hours are assumed to be "fake" sessions, resulting from the use of a large cache server by multiple users, or from spiders. Sawmill discards these sessions from the statistics, because these sessions are not accurate representations of the way any single user moves through the site -- they are semi-random juxtapositions of multiple true sessions, and are not very useful. The default maximum session duration is 2 hours; this can be customized in the Stats Misc tab of the Configuration Options. Setting this to 0 will cause all sessions to be included, eliminating this difference between visitors and session users. Incidentally, using "visitor cookies" or "session cookies" in your log data and configuring Sawmill to use those as visitor ids (see [faq-visitorcookies] will eliminate the need for this 2-hour maximum.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes; click the "export" link in the toolbar above reports to export the data from that report's table in CSV
format. Many programs, including Excel, can import CSV format files.
Long Answer
Sawmill supports CSV export of any table. Just view the statistics, find the table you want, and click the "export" link in the toolbar. Save the resulting file from your browser, and import it into Excel or any other program that supports CSV. You can also generate CSV from the command line, like this: Sawmill.exe -p profilename -a ect -rn "view-name" for instance, Sawmill.exe -p MyConfig -a ect -asv "Pages" You can also use the -f option (Using Log Filters) on the command line to use filters on the table data.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Sawmill accurately reports the data as it appears in the log file. However, many factors skew the data in
the log file. The statistics are still useful, and the skew can be minimized through server configuration.
Long Answer
Sawmill (and all other log analysis tools) reports statistics based on the contents of the log files. With many types of servers, the log files accurately describe the traffic on the server (i.e. each file or page viewed by a visitor is shown in the log data), but web log files are trickier, due to the effects of caches, proxies, and dynamic IP addresses. Caches are locations outside of the web server where previously-viewed pages or files are stored, to be accessed quickly in the future. Most web browsers have caches, so if you view a page and then return in the future, your browser will display the page without contacting the web server, so you'll see the page but the server will not log your access. Other types of caches save data for entire organizations or networks. These caches make it difficult to track traffic, because many views of pages are not logged and cannot be reported by log analysis tools. Caches interfere with all statistics, so unless you've defeated the cache in some way (see below), your web server statistics will not represent the actual viewings of the site. The logs are, however, the best information available in this case, and the statistics are far from useless. Caching means that none of the numbers you see are accurate representations of the number of pages actually views, bytes transferred, etc. However, you can be reasonably sure that if your traffic doubles, your web stats will double too. Put another way, web log analysis is a very good way of determining the relative performance of your web site, both to other web sites and to itself over time. This is usually the most important thing, anyway-- since nobody can really measure true "hits," when you're comparing your hits to someone else hits, both are affected by the caching issues, so in general you can compare them successfully. If you really need completely accurate statistics, there are ways of defeating caches. There are headers you can send which tell the cache not to cache your pages, which usually work, but are ignore by some caches. A better solution is to add a random tag to every page, so instead of loading /index.html, they load /index.html?XASFKHAFIAJHDFS. That will prevent the page from getting cached anywhere down the line, which will give you complete accurate page counts (and paths through the site). For instance, if someone goes back to a page earlier in their path, it will have a different tag the second time, and will be reloaded from the server, relogged, and your path statistics will be accurate. However, by disabling caching, you're also defeating the point of caching, which is performance optimization-- so your web site will be slower if you do this. Many choose to do it anyway, at least for brief intervals, in order to get "true" statistics. The other half of the problem is dynamic IP addresses, and proxies. This affects the "visitor" counts, in those cases where visitors are computed based on the unique hosts. Normally, Sawmill assumes that each unique originating hostname or IP is a unique visitor, but this is not generally true. A single visitor can show up as multiple IP addresses if they are routed through several proxy servers, or if they disconnect and dial back in, and are assigned a new IP address. Multiple visitors can also show up as a single IP address if they all use the same proxy server. Because of these factors, the visitor numbers (and the session numbers, which depend on them) are not particularly accurate unless visitor cookies are used (see below). Again, however, it's a reasonable number to throw around as the "best available approximate" of the visitors, and these numbers tend to go up when your traffic goes up, so they can be used as effective comparative numbers. As with caching, the unique hosts issue can be solved through web server profile. Many people use visitor cookies (a browser cookie assigned to each unique visitor, and unique to them forever) to track visitors and sessions accurately. Sawmill can be configured to use these visitor cookie as the visitor ID, by extracting the cookie using a log filter, and putting it in the "visitor id"
field. This isn't as foolproof as the cache-fooling method above, because some people have cookies disabled, but most have them enabled, so visitor cookies usually provide a very good approximation of the true visitors. If you get really tricky you can configure Sawmill and/or your server to use the cookie when it's available, and the IP address when it's not (or even the true originating IP address, if the proxy passes it). Better yet, you can use the concatenation of the IP address and the user-agent field to get even closer to a unique visitor id even in cases where cookies are not available. So you can get pretty close to accurate visitor information if you really want to. To summarize, with a default setup (caching allowed, no visitor cookies), Sawmill will report hits and page views based on the log data, which will not precisely represent the actual traffic to the site, and so will and any other log analysis tool. Sawmill goes further into the speculative realm than some tools by reporting visitors, sessions, and paths through the site. With some effort, your server can be configured to make these numbers fairly accurate. Even if you don't, however, you can still use this as valuable comparative statistics, to compare the growth of your site over time, or to compare one of your sites to another.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Sawmill uses the visitor id field to identify unique visitors. It decides that a new session has begun if a
visitor has been idle for 30 minutes. It rejects sessions longer than 2 hours.
Long Answer
Sawmill computes session information by tracking the page, date/time, and visitor id (which is usually the originating hostname) for each page view in the log data. When a session view is requested, it processes all of these page views at the time of the request, ignoring those that are filtered out by filters on the page or date/time fields. All other hits are included-filters on other fields are ignored in session information. Sawmill groups the hits into initial sessions based on the visitor id-- it start by assuming that each visitor contributed one session. It sorts the hits by date so it has a click-by-click record of the movement of each visitor. Then it splits the sessions, using the customizable session timeout interval (30 minutes by default). Since there is no real "log out" operation in HTTP, there is no way for Sawmill to know the real time that a user leaves the site; it can only guess by assuming that if they didn't click anything for 30 minutes, they must have left and come back. The split step, then, increases the number of sessions, resulting in possibly more than one session per visitor. Next, Sawmill discards sessions over 2 hours long (this is configurable). The idea behind this is that most web sessions are considerably shorter than that, so there's a good chance that any really long session is actually caused by multiple visitors using the same proxy server to visit the site. That looks like one long session because all of the hits seem to come from the proxy server. Sawmill rejects these because there is no way to tell which hits were from a particular visitor. If you're using visitor cookies to track unique visitors, this will not be a problem, so you can turn this option to a high value to see all your sessions, even those over 2 hours. Finally, Sawmill discards sessions based on the Session Filters (which you can set in the Session Filters bar at the top of the statistics). The session filters can be set to discard all sessions except those from a particular visitor, or they can be set to discard all sessions except those which go through a particular page. After that, Sawmill is ready to generate the statistics reports. The "Sessions Overview" report is generated by examining the sessions in various ways (for instance, the repeat visitors number is the number of visitors which have more than one session; i.e. those whose sessions were "split" by the timeout interval). The "enty pages" and "exit pages" report is generated by tabulating the first and last pages of every session. The "session pages" report is generated by finding every occurrence of each page in any session, computing how long it was from then until the next page in that session (exit pages are considered to have zero time spent per page), and tabulating the results for all pages to compute time per page and other statistics. The "paths (clickstreams)" report shows all the sessions in a single expandable view.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Edit the profile .cfg file, and change the field name in the numerical_fields section of that report element.
Long Answer
If you want to change the field which is graphed, in the graph above a particular report table, do this: 1. 2. 3. 4. Open the profile .cfg file (in the profiles folder of the LogAnalysisInfo folder) in a text editor. Find the Reports section (Search for "reports = {") Scroll down until you see the report you want to change, for example "Days", so look for "days = {" A few lines below that find the line that says "graph = {". You should see this: numerical_fields = { hits = "true" } # numerical_fields 5. Change this so that it reads: numerical_fields = { visitors = "true" } # numerical_fields You can substitute any numerical field name here, so page_views/hit/visitors/bytes etc (you must use the internal name for the field, not the "display" label). 6. Refresh the browser to see the new graph. NOTE: In some cases, just refreshing the browser may not actually show the new graph. You can be sure that once these changes have been made Sawmill will be producing the new graph, it is the browsers job to show you it. You may need to empty your browsers cache to be emptied for this to be seen.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Yes. Sawmill lets you break your statistics down by any of a large number of criteria, and by more than one at a time. Among these criteria are "day of week" and "hour of day," so you can see weekday or hour information just by adding the appropriate field to your database.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Yes, Sawmill can pinpoint your hits to the second. By default, it also breaks down hits by hour, so you can detect peak usage and other hourly information. The Log Detail report show complete information about each event, down to the second, so you can zoom in on any part of your statistics, and then zoom down to the level of the log data to see event-by-event second-bysecond what occurred.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Normally, you can't. However, you can set up "reflector" pages if you need this information.
Long Answer
Sawmill can show you the last page visitors hit before they exited the site, but it cannot usually show you where they went. The reason is that when they click a link on your site leading to another site, their web browser contacts the other site (not your site) for the new page--your web server is not contacted at all when someone clicks a link to leave your site. So the hit appears in the remote site's log files, not yours, and Sawmill cannot report on it because it's not in your log files. Nevertheless, you can track exits from your site if you're willing to set up "reflector" pages. A reflector page is a page whose sole purpose is to reflect a visitor to another page. This can be done with a trivial HTML page containing only a META RELOAD tag in the HEAD section. For instance, the following simple HTML page will cause a visitor to be immediately redirected to http://www.flowerfire.com: <html> <head> <meta http-equiv="Refresh" content="0; URL=http://www.flowerfire.com/"> </head> </html> By creating a page like this for every exit link on your site, and changing your links to point to the reflector page rather than the actual destination page, you can track exit link usage. When a visitor clicks the exit link, they will be taken to the reflector page, and then immediately reflected to the actual destination. This will happen quickly enough that they will not notice the reflection happening--it will seem to them that they went straight to the destination page. But your log data will include a hit on the reflector page, so you will be able to see which exit links are being taken. In the "exit pages" view, the reflector links will show which links were taken when leaving the site. A more sophisticated way of doing this is to create a CGI script (or other type of script) which generates the reflector page on the fly, given a URL parameter. If you do it that way, you won't need to create a separate reflector page for each link; you can just use the same script for all your external links.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Delete or disable the 'Strip non-page-views' log filter, and rebuild the database
Long Answer
By default, Sawmill does not track the hits on individual image files and other non-page files when analyzing web log data, to save space in the database and reduce clutter in the "Pages" report. It does this by replacing the filename portion of the page with the value '(nonpage)', so all non-page hits will appear as values ending with '(nonpage)'. If you need this information, you need to tell Sawmill to track filenames for all hits. To do this, go to the Log Filters section of the Config section of your profile, and delete or disable the log filter called 'Strip non-page-views', which replaces the filename for non-page-view hits with '(nonpage)'. Then rebuild the database and view the reports, and all files (not just pages) will appear in the "Pages" and "Pages/directories" reports.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ: robots.txt
Question: Why do I see hits on a file called "robots.txt" in my statistics?
Short Answer: robots.txt is a file that tells search engine spiders and robots what they can do, so a hit on robots.txt
means that a spider visited your site.
Long Answer
robots.txt is a "standard" file that appears at the root level of many web sites to tell search engine robots what to do on the site. Robots, also known as spiders, are computer programs that attempt to systematically visit and catalog all the pages on the Web. robots.txt tells the robots what they can or can't do on the site (whether they can index the site, which pages they may not index, etc.). Any correctly written robot will hit that page first, and follow the instructions it finds there. So the hits you're seeing are from robots. If you don't have a robots.txt file on your site, the robots don't actually get any information--they get a "404 File Not Found" error instead, which they generally interpret as "index whatever you want."
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ: favicon.ico
Question: Why do I see a hits on a file called "favicon.ico" in my statistics?
Short Answer: favicon.ico is a special icon file that Internet Explorer looks for when it first visits the site.
Long Answer
Recent versions of Microsoft Internet Explorer, Safari, and other web browsers have a feature that lets web site owners define an icon for their site, which will appear in the address bar, the Favorites menu, and other places. If you create an icon file called favicon.ico in a directory of your web site, then any page in that directory that is bookmarked will appear in the Favorites menu with your custom icon. The browser checks for this file whenever a bookmark is created, so if you don't have the file, it will show up as a 404 (file not found) link. As a side note, this is a good way to see who is bookmarking your site.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Edit the report in the profile .cfg file to add a new item to the columns group.
Long Answer
Edit the profile .cfg file, which is in the profiles folder of the LogAnalysisInfo folder. Look for "reports = {" to find the reports list. Look down until you find a report which shows a table for one of the fields you want, e.g. in the source_ip/destination_ip/ source_port/destination_port example, you would look for the destination_port report (the actual name of this report, and of field values, will vary depending on your log format). The report will look something like this: destination_port = { report_elements = { destination_port = { label = "$lang_stats.destination_port.label" type = "table" database_field_name = "destination_port" sort_by = "events" sort_direction = "descending" show_omitted_items_row = "true" omit_parenthesized_items = "true" show_totals_row = "true" starting_row = "1" ending_row = "10" only_bottom_level_items = "false" show_graph = "false" columns = { 0 = { type = "string" visible = "true" field_name = "destination_port" data_type = "string" header_label = "%7B=capitalize(database.fields.destination_port.label)=}" display_format_type = "string" main_column = "true" } # 0 1 = { header_label = "%7B=capitalize(database.fields.events.label)=}" type = "events" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "events" data_type = "int" display_format_type = "integer" } # 2 } # columns
} # destination_port } # report_elements label = "Destination report" } # destination_port There may be other columns, but the two shown here are a minimum -- one for the destination port field, and one for the "events" field (might be called "packets" or something else). This describes a report which has two columns: destination port and number of events. To add a four-column source_ip/destination_ip/source_port/destination_port report, copy the entire thing and change the name to custom_report. Then duplicate the destination_port column three times, and edit the copies so they're source_ip, destination_ip, and source_port. The result: custom_report = { report_elements = { custom_report = { label = "Custom Report" type = "table" database_field_name = "destination_port" sort_by = "events" sort_direction = "descending" show_omitted_items_row = "true" omit_parenthesized_items = "true" show_totals_row = "true" starting_row = "1" ending_row = "10" only_bottom_level_items = "false" show_graph = "false" columns = { source_ip = { type = "string" visible = "true" field_name = "source_ip" data_type = "string" header_label = "%7B=capitalize(database.fields. source_ip.label)=}" display_format_type = "string" main_column = "true" } # source_ip destination_ip = { type = "string" visible = "true" field_name = "destination_ip" data_type = "string" header_label = "%7B=capitalize(database.fields. destination_ip.label)=}" display_format_type = "string" main_column = "true" } # destination_ip source_port = { type = "string" visible = "true" field_name = "source_port" data_type = "string" header_label = "%7B=capitalize(database.fields. source_port.label)=}" display_format_type = "string" main_column = "true" } # source_port destination_port = { type = "string" visible = "true" field_name = "destination_port" data_type = "string" header_label = "%7B=capitalize(database.fields.destination_port.label)=}"
display_format_type = "string" main_column = "true" } # destination_port 1 = { header_label = "%7B=capitalize(database.fields.events.label)=}" type = "events" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "events" data_type = "int" display_format_type = "integer" } # 2 } # columns } # custom_report } # report_elements label = "Custom report" } # custom_report Finally, add it to the reports_menu list (again, this is easiest to do by duplicating the existing reports_menu item for destination port), like this: custom_report = { type = "view" label = "Custom Report" view_name = "custom_report" visible = "true" visible_if_files = "true" } # custom_report And you should have a Custom Report item in your reports menu, which links to the multi-column report. If you're creating a two-column report, you can get an indented layout with subtables (rather than a "spreadsheet" layout) by adding the following section to the report group (e.g. right above the "} # custom_report" line, above): sub_table = { ending_row = "10" omit_parenthesized_items = "true" show_omitted_items_row = "true" show_averages_row = "false" show_totals_row = "true" } # sub_table This sub_table node will work only for reports which have exactly two non-numerical columns (e.g. source_ip/destination_ip).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Sawmill produces reports that will track the network usage, network security and give a comprehensive view of who is accessing your website at any given date or time. The Single-Page Summary report will give the network detection and audit history reporting that is needed to be compliant with both HIPAA and SOX.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ: GeoIP database in Sawmill is not as accurate as the one on the Maxmind site
Question: Some of the IP addresses in my data are not resolved properly to country/region/city by Sawmill. I know that Sawmill uses the MaxMind GeoIP database, and when I go to the MaxMind site, their demo resolves these IPs properly. Why isn't Sawmill doing the same as the online GeoIP demo?
Short Answer: Sawmill uses the GeoLite City database, a less accurate (and less expensive) version of the GeoIP City
database. To get full accuracy, buy GeoIP City from MaxMind.
Long Answer
MaxMind provides two tiers for their City database: GeoIP City and GeoLite City. They do not provide GeoIP City for bundling with products like Sawmill, so Sawmill includes the GeoLite City database. GeoLite City is less accurate than GeoIP City, so the results you get from Sawmill using its default GeoLite City database will be less accurate than using GeoIP City. Since the web demo of GeoIP on the MaxMind site uses GeoIP City, there will be some cases where Sawmill cannot place an IP, but the web demo can. The solution is to upgrade to the full GeoIP City database, which you can do directly through MaxMind. That database is a drop-in replacement for GeoLite City, so once you have purchased it, you can drop it in on top of the GeoIP-532.dat file in the LogAnalysisInfo folder in your Sawmill installation, and rebuild your databases, and you will get a more accurate geographical location.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Add an extra column to the spreadsheet to convert them to fractional days; or use a custom database field
in the report element.
Long Answer
Excel represents durations in days, so "1" is one day, and "1/24" is one hour. But Sawmill represents them as seconds for some log formats, milliseconds for others, and microseconds for a few. To format them as durations in Excel, they must be converted. This can be done either after the export, in Excel, or before the export, in Sawmill.
time_taken_excel_format = { header_label = "time taken (Excel format)" type = "string" show_number_column = "true" show_percent_column = "false" show_bar_column = "false" visible = "true" field_name = "time_taken_excel_format" data_type = "string" display_format_type = "duration_milliseconds" } # time_taken_excel_format 3. Rebuild the database; then when you export this report, it will include a new "time taken (Excel format)" column, with standard Sawmill duration formatting ("Y years, D days, HH:MM:SS.MMM").
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: If you have no filters active, then they will not be saved with your report.
Long Answer
You have created a new report, when you select "Save as New Report, under the "Miscellaneous" button, if you have no active date or general filters active, then you will not be saving those with this report. Those selections will be dimmed out in the dialogue box. If you want filters turned on, select those in the "Filters" menu and then save your report.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: You may be using a proxy server which prevents you from accessing a server running on your own
machine. Try reconfiguring the proxy to allow it, or try running Sawmill on IP 127.0.0.1 (the loopback interface).
Long Answer
If you're running Windows 2003 and using Internet Explorer, look at Can't access server with Windows 2003 and IE first, and return here if that doesn't help. When you first start Sawmill in web server mode, it tries to start a web server, running on the local machine, using port 8988. If this fails, it should give you an error message; if it succeed, it should give you a URL. If you're seeing a URL when you start Sawmill, it generally means that the Sawmill server started successfully, and is ready to answer web browser requests. Sometimes, though, when you actually try to access that URL, you may find that the server doesn't answer. Your browser may tell you that there's a DNS error, or that it couldn't contact the server, or that there's some other kind of error. If Sawmill displayed a URL, the server itself is probably working fine-- the problem is not with the server, but with the network connection to the server. This can happen, for instance, if you're using a web server proxy or cache server, and it doesn't know about the IP address of your own machine. When you contact the cache and ask to connect to your own machine, it gets confused, because normal web requests come from inside machines contacting outside machines, and this one is an inside machine contacting another inside machine (itself). A well-configured proxy server can handle this, but one that is not configured to handle internal requests may attempt to get the URL from the outside, and may give an error when it doesn't find it there. Some proxies/caches/firewalls will also refuse to let through traffic on port 8988 (Sawmill's default port), regardless of other settings. There are several solutions. One choice is to reconfigure the proxy or cache server to allow HTTP connections from internal machines to other internal machines, on port 8988. Then Sawmill will be able to operate in its preferred mode, on port 8988 of the machine's first IP address. If that's not an option, you may be able to get Sawmill to work by running it on the loopback interface (IP 127.0.0.1), or on port 80 (the standard web server port). The easiest way to find a working solution is to use the command-line interface to Sawmill, at least until you have it working; you can go back to using the graphical version later. From the command line, run Sawmill like this: Sawmill.exe -ws t -sh 127.0.0.1 -wsp 80 This will attempt to start Sawmill's web server on IP 127.0.0.1 (the loopback interface), using port 80. This will only work if there is not a web server already running on the system-- only one server can use port 80 at a time. If you already have a web server running, use port 8988 instead. Try the command above with different IP addresses (127.0.0.1, and any IP addresses you know belong to your computer), and different ports (try 8988 first, then 80). With a little luck one of the choices will start a server that you can connect to. Once you've got the Sawmill interface working in your web browser, you can set it to use that IP and port permanently in the Preferences, from the Administrative Menu. Once you've set the IP and port in the Preferences, you can quit the command-line Sawmill, and start using the graphical version, if you prefer. If that still doesn't work, check if there is a firewall on your system or on your network, which is blocking traffic from your machine to itself, on port 8988. If there is, try disabling the firewall temporarily (or reconfigure it to allow the traffic), and see if
it works then. If it works with the firewall disabled, and doesn't work with the firewall enabled, then the firewall is probably blocking the necessary traffic. You'll probably want to reconfigure the firewall to let the network traffic through on 8988. If none of these work, and you have a web server running on your system there is always CGI mode. Sawmill can run under any running web server in CGI mode; if you can connect to the web server itself, you'll be able to use Sawmill by running Sawmill under your local server as a CGI program. Finally, if you can't get Sawmill to work to your satisfaction, please contact support@flowerfire.com.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: The "Internet Explorer Enhanced Security Configuration" may be enabled, blocking access; uninstall it or
add 127.0.0.1:8988 to the trusted sites.
Long Answer
Windows 2003 starts up with Internet Explorer "locked down" in a highly secure mode where only certain sites are accessible. In particular, Sawmill's default URL cannot be accessed by Internet Explorer. To enable access to Sawmill from Internet Explorer, do this: 1. 2. 3. 4. 5. 6. 7. Go to Internet Explorer. Go to the Tools menu. Choose Internet Options. Click the Security tab. Click the Trusted Sites icon. Click the Sites button. Add 127.0.0.1:8988 to the list.
Now you should be able to access Sawmill with Internet Explorer. Alternatively, use a different browser which does not restrict access. Alternatively, go to the Add/Remove Programs control panel and uninstall "Internet Explorer Enhanced Security Configuration".
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Your browser isn't storing the cookie Sawmill needs to maintain the login, or something is blocking the
browser from sending the cookie. Make sure cookies are on in the browser, firewalls aren't blocking cookies, and don't use Safari 1.2.1 or earlier as your browser.
Long Answer
Sawmill uses web browser cookies to store your login information, which keeps you logged in. If the browser isn't passing the cookie back to Sawmill properly, Sawmill won't know you're logged in, and you'll keep getting the login screen. To keep this from happening, make sure cookies are enabled in your web browser. If you want to be selective about who gets cookies, at least make sure that the hostname or IP where Sawmill is running is allowed to get cookies. If your browser differentiates %22session cookies%22 from other cookies, all you need is session cookies. Use an approved browser--some browsers don't handle cookies quite right. Approved browsers are Internet Explorer 6, Safari 1.2.2 or later, and Firefox. Others may work, but have not been verified. In particular Safari 1.2.1 and earlier does not handle cookies properly -- this is fixed in 1.2.2 and later.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: The Service must run with the same privileged user account that has the mapped drive, share, directory,
or mount point privilege.
Long Answer
The mapped drive, share, directory, or mount point is a permission issue that involves security. It is therefore necessary to have the service run using that same privileged account that the drive was originally mapped from, or an account which has permissions to access the share, etc. If the service cannot connect as the same user that has the privilege, the network resource will not be available. Here is a step-by-step walkthrough on how to change the service logon permission: 1. 2. 3. 4. 5. 6. Go to Control Panel Open up Services (location varies slightly with particular OS version) Find the Sawmill entry (or the entry for the service running which is being used to run Sawmill and right mouse click it. Select Properties Under the 'Log On' tab deselect the 'Local System Account' radio button by selecting 'This account' and hit the browse button In the 'Select User' dialog box, you may type in the privileged user's UserID or you may also browse for it. Once you have selected the correct user, click the OK button and the 'This account' field will be populated by a period, then a back slash () then the users' ID Enter the privileged user's password twice. This will show up as asterisks. This is for security reasons and by design Back at the Control Panel properties for the Sawmill entry, right mouse click and select the 'restart' option. When you next run Sawmill, access to the mapped drive, share, directory, or mount point will be available .
7. 8. 9.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Windows 2003 has a strict security policy which prevents access to network drives from Sawmill. To
make it work, you need to let %22everyone%22 permissions apply to anonymous, and remove the restriction on anonymous access to named pipes and shares (in Administrative Tools).
Long Answer
The Windows 2003 security policies prevent programs like Sawmill from accessing network drives (mapped or UNC). In order to enable access to these drives, you need to do this: 1. 2. 3. 4. 5. 6. 7. Go to Control Panel Open Administrative Tools Click Local Security Policy Click the Local Policies folder Click the Security Options folder Under Network Access, turn on %22Let Everyone permissions apply to anonymous users.%22 Under Network Access, turn off %22Restrict anonymous access to named pipes and shares.%22
Now Windows 2003 will let Sawmill see and access network drives.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ: Sawmill uses too much memory for builds/updates, and is slow to view
Question: When I build or update my database with Sawmill, it uses a huge amount of memory. Then, when I view statistics, it's very slow. What can I do about that?
Long Answer
The main portion of the database that uses memory are the "item lists". There is one list for each database field, and each list contains all the unique values for that field. If one of the fields in your database has many unique values, (millions) it can require a very large amount of memory to track. Simplifying the field can save memory. To check which database field is the main culprit, look at the sizes of the files in the "items" subfolder, in the database folder (in the Databases folder of the LogAnalysisInfo folder). For instance, if location folder is the largest, at 500 Meg, then you know that the "location" database field is responsible for the largest part of the memory usage. When you've found the culprit, you need to reduce its memory usage. This is where you'll have to make compromises and cuts. The simplest solution is to delete the database field, and stop tracking and reporting on it. If that's not an option, you'll need to simplify the field in some way. The key point here is that you are trying to reduce the number of unique field values that Sawmill sees and tracks. The pool file, which is usually the largest one, contains a back-to-back list of all all field values that are used in the database; if you can reduce the number of possible field values used by Sawmill, you will reduce the size of the file. If the field is a hierarchical (like a pathname, hostname, date/time, or URL), you can simplify it by tracking fewer levels, by adjusting the suppress_top and suppress_bottom values in the database.fields section of the profile .cfg file (in the profiles folder of the LogAnalysisInfo folder). For instance, the page field of web logs is tracked nine directories deep by default; you can simplify it by tracking only the top three levels directories. If your date/time field is set to track information to the level of minutes, you can change it back to tracking hours or days only. Usually, you will want to turn off bottom-level items checkbox for the field, since it's usually the bottom level that has all the detail. Another possibility is to use a Log Filter to simplify the field. The default filter for web logs which replaces everything after ? with "(parameters)" is an example of this. By replacing all the various parameterized versions of a URL with a single version, this filter dramatically decreases the number of different page field values that Sawmill sees, therefore dramatically decreasing the memory usage of the "page" field. Similarly, if you have a very complex section of your directory structure, but you don't really need to know all the details, you can use a Log Filter to delete the details from your field, collapsing the entire structure into a few items. A common source of high memory usage is a fully-tracked hostname/IP field. By default, Sawmill tracks only the first two levels of hostnames for web and proxy logs; i.e. it will tell you that a hit came from .sawmill.net, but not that it came from some. maching.sawmill.net. Because of the tremendous number of IP addresses that appear in large log files, this field can be a problem if it's set to track individual IPs (there's a checkmark that lets you do this when you create the profile). If this is happening, consider tracking only a few levels of the hostname hierarchy, instead of the the full IP address. Of course, sometimes you really need the full detail you're tracking in a very large field. If you can't reduce the detail, and you can't reduce the amount of log data, then the only solution is to get enough memory and processing power to efficiently handle the data you're asking Sawmill to track.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Use a MySQL database, and/or use a 64-bit computer and operating system, and/or simplify your
database
Long Answer
This error means that Sawmill tried to allocate another chunk of memory (N additional bytes, on top of whatever it was already using), and the operating system told it that there was no more memory available for it to use. This error is usually not a bug; it almost always indicated that Sawmill really has exhausted all memory available. This error typically happens when using the "internal" database with a very large dataset. The "internal" database is not nearly as efficient in its use of memory as MySQL. Or, to put it another way, the "internal" database aims for performance above all, and mostly does it by keeping everything in memory. This means that it does not scale as well to extremely large datasets, or extremely large reports. Typically, the internal database will work well up to about 10GB of uncompressed log data. Above that, scalability may become an issue. There are several ways in which the internal database does not scale well:
q
Itemnums are kept in memory. This is the major problem when building a database. Sawmill keeps a list of all values seen for each field (e.g. a list of all IP addresses which appear in a particular field, or a list of all URLs which appear in another field) in the "itemnum" tables. These tables are kept in memory (or at least mapped to memory, so they still use available memory addressing space). In the case of an IP address field, for instance the source IP address of web server log, each value is about ten bytes long. If there are 10 million unique IPs accessing the site, this table is 100 million bytes long, or 100MB. Similarly for a proxy log analysis, if each unique URL is 100 bytes long and there are 10 million unique URLs in the log data, the table will be 1GB. Tables this large can easily exceed the capabilities of a 32bit system, which typically allow only 2GB of memory to be used per process. Session analysis is done in memory. This can be a problem during reporting. When using an internal database, Sawmill computes session information in memory. The session calculation involves direct manipulation of a table with one line per page view, and 20 bytes per line. If the dataset is 100 million lines, this is 2GB of data, which again exceeds the capacity of a 32-bit system. Report tables are held in memory. This can be a problem during reporting. When using the internal database, Sawmill keep the temporary tables used during report generation in memory, and also keeps the final table in memory. The final table in particular uses a fairly memory-inefficient representation, with about 200 bytes per table cell. This is no problem for most tables, but in the extreme examples above (10 million unique IPs), a table might contain 10 million rows; if there are 5 columns that's 50 million cells, which would require about 10 GB of RAM, exceeding the abilities of any 32-bit system, and many 64-bit ones.
There are many solutions to these memory usage problems. All three issues can be improved by using a MySQL database, which uses disk files for most operations, including the three above, instead of RAM. This is usually the best solution to extreme memory usage. However, even with a MySQL database, Sawmill keeps the final report table in memory, so even with a MySQL database, very large tables may exceed the memory of a 32-bit system. Another solution is to use a 64-bit system and operating system; with a 64-bit processor, Sawmill will be able to allocate as much RAM as it needs, provided the RAM is available on the system (and it can use virtual memory if it isn't). This is the most complete solution; with a large amount of RAM on a 64-bit system, it should be possible to build extraordinarily huge databases without running out of memory.
Combining both solutions, and running Sawmill on a 64-bit system using MySQL, provides the best scalability. If neither option is available, you'll need to simplify the dataset; see Memory, Disk, and Time Usage for suggestions. If report generating is the problem, you may also be able avoid the problem report, or use a simpler version of it. For instance, if it's the Pages report in a web server analysis, using the Pages/directories report instead will usually greatly reduce memory usage, because it generates a much smaller table by grouping files by directory. For session reports, zooming in on a particular day can greatly reduce session memory usage, since the memory usage is proportional to the size of the filtered dataset.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ: Error with MySQL: "The total number of locks exceeds the lock table size"
Question: When I try to build a database, or view reports, I get an error, "The total number of locks exceeds the lock table size". How can I fix this?
Long Answer
This occurs when MySQL runs out of locks, which for an InnoDB database occurs when the buffer pool is full. You can fix this by increasing the size of the buffer pool, by editng the innodb_buffer_pool_size option in my.cnf (my.ini), to set innodb_buffer_pool_size to a number higher than the default (which is typically 8M); for instance: innodb_buffer_pool_size = 256M Then, restart MySQL, and try the Sawmill operation again.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: No -- your server is down. Sawmill runs on your computer, not on ours -- contact your network
administrator if you're having problems accessing it.
Long Answer
Sawmill runs as a web server on the computer where it was installed, which is a client computer, not one of our servers. So if you're having trouble accessing Sawmill through your web browser, it means that your installation of Sawmill is messed up in some way (Sawmill may not be running where you expected it to be). If you installed Sawmill yourself, you may need to restart it. If someone else installed Sawmill, please contact them (it may be your network administrator) for assistance in getting Sawmill up and running again. On a related note, Sawmill never contacts Flowerfire, or any of Flowerfire's computers. It does not transmit log data to Flowerfire, it does not transmit statistics to Flowerfire, it does not receive any information or data from Flowerfire (the sole exception being the download of the GeoIP database, if it isn't present in the installation), and in all other ways it is a complete self-contained program that does not rely on Flowerfire's servers. Because Sawmill runs as a web server, people often assume that Sawmill is actually running on the Internet, on one of our servers, but it isn't -- it runs on your computers, and does not use the Internet or the network except where you specifically ask for it (i.e. to download files by FTP when you've requested that it do so, or to send mail when you've asked it to, or to look up IP numbers using DNS when you've asked it to).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Sawmill has probably crashed, so this could be a bug in Sawmill. See the long answer for suggestions.
Long Answer
This error message means that Sawmill tried to do a long task, like a report generation or a database build, and while it was trying to display progress for the task, it noticed that the task was no longer running, but had not properly computed and stored its result. A task always returns a result, so this means that something has gone wrong internally in Sawmill. The most likely cause is a crash: the background task crashed, so it will never able to complete and return the result. A crash is often due to a bug in Sawmill, but it's also possible if Sawmill runs out of memory. Make sure there is enough memory available; if you watch the memory usage while you repeat the task, does it seem to reach a high level, near the maximum memory of the system, before failing? If so, you may need more memory in your system, in order to perform that task. If it's not memory, try running the task from the command line. If it's a database build, you can run it from the command line using this: Building a Database from the Command Line. If it's a crash during the report generation, you can run it from the command line similarly to a database build, but using "-a grf -rn reportname -ghtd report" instead of "-a bd", where reportname is the internal name of the report. Run Sawmill from the command line with "-p profilename -a lr" to get a list of reports. For instance, sawmill -p myprofile -a grf -rn single_page_summary -ghtd report will generate the single-page summary to a folder called "report". If this report fails, it may give a better error message about what happened to it. Whether it fails or succeeds, email support@flowerfire.com with the outcome of your test. If possible, include the profile, and enough log data to reproduce the error (up to 10 MB, compressed). Report that you are seeing a crash on report generation (or database build, or whatever), and we will attempt to reproduce it on our own systems, determine the cause, and fix it, or help you resolve it, if it's not a bug.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
FAQ: Winsock 2
Question: When I run Sawmill on Windows, I get an error: "A required DLL is missing: WS2_32.DLL." What's going on?
Long Answer
To run on Windows 95, and some early versions of Windows 98, Sawmill requires Winsock2, a networking component available for free from Microsoft. You can download Winsock2 from here. Winsock2 is already part of Windows 98 (newer versions), Windows NT 4.0, and Windows 2000, so you do not need to download this component unless you are using Windows 95 or an old version of Windows 98.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: You need to download and install the latest Service Pack for Windows 98.
Long Answer
Sawmill requires a DLL called OLEACC.DLL. This DLL is part of recent versions of Windows 98, but it is not part of older versions of Windows 98. If you're running an older Windows 98, you'll need to install the latest Service Pack before you can run Sawmill. The service pack is a free download from Microsoft.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Install the latest Internet Explorer, and the problem should go away.
Long Answer
This DLL is part of Microsoft Internet Explorer. It is also included in many recent versions of Windows. If you see this error, download and install the latest Internet Explorer, and the problem should go away.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Sawmill requires the libstdc++ library. This is available by default on many platforms, and is included in
the Sawmill distribution on others (including Solaris)
Long Answer
Sawmill requires the libstdc++ library. This is available by default on many platforms, but it is not available on some older platforms, and it is often not available on Solaris. There are several ways of making this available:
q
Install the g++ compiler. This is available for all platforms from GNU. g++ is also available as a package (e.g. a Red Hat RPM) for most platforms, and is available as an installation option on most platforms. libstdc++ is part of the g++ compiler, so installing it will install libstdc++. OR Use the libstdc++ included with Sawmill. On Solaris, the standard download of Sawmill includes the libstdc++ file (whose name starts with libstd++). If you have root access, the easiest way to install this is to copy it to /usr/lib. If you don't, you can set the environment variable LD_LIBRARY_PATH to point to your Sawmill installation. For instance, if your Sawmill installation is at /usr/sawmill, you can run this: setenv LD_LIBRARY_PATH "$LD_LIBRARY_PATH:/usr/sawmill" export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/sawmill" to add /usr/sawmill to the end of the LD_LIBRARY_PATH variable. You'll only need one of these two commands (the first if you're using csh as your shell, and the second if you're using bash), but it won't hurt to run them both if you're not sure which to use; you'll just get a harmless error message from the wrong one. These commands will last for one command-line session. If you need to make this change permanent, you can add the pathname to a separate line in the /etc/ld.conf file, or you can add the command above to your one of your login scripts (e.g. .login, .cshrc, .bashrc). After setting LD_LIBRARY_PATH, you should be able to run Sawmill OR Find an existing libdstdc++ on your system. It is possible that you do have libstdc++ installed on your system, but it's not in your LD_LIBRARY_PATH. If that's the case, you can add the location of libstdc++ to the LD_LIBRARY_PATH using the instructions above. For instance, if it is in /usr/local/lib, you can add that to LD_LIBRARY_PATH to use it.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: This is a GNU library incompatibility; build Sawmill from source instead of using the binary distribution.
Long Answer
This occurs on UNIX systems, and is due to Sawmill being built expecting a different version of the GNU libraries than the one you have on your system (libstdc++). In other words, this is an operating system incompatibility -- we're building on a different version than you're running on. The best solution is to use the "encrypted source" version of Sawmill, rather than the binary distribution for your platform; i.e., choose "encrypted source" as the "operating system" when you're downloading Sawmill. This version requires that you have a C/C++ compiler installed on your system. Follow the instructions to build Sawmill from source -- it's easy. The resulting binary will run properly on your system If you don't have a compiler installed, please contact support@flowerfire.com.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Try deleting the IPNumbersCache file in LogAnalysisInfo -- see the long answer for other solutions.
Long Answer
(See Resolving IP Numbers for information about reverse DNS lookup). Usually, this occurs because the DNS server can't resolve the IPs. The DNS server you're using needs to know about the IPs you're resolving. For instance, you can't use an external DNS server to resolve internal IP addresses, unless the external DNS server knows about them. Try using an internal DNS server, or another DNS server, if the first DNS server you try can't seem to resolve the IPs. It's useful to manually query the DNS server to see if it can resolve a particular IP; on most operating systems, this can be done with the "dnslookup" command.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: You may have set the "temporary folder" incorrectly during installation. Try deleting the preferences.cfg
file in LogAnalysisInfo, and access Sawmill to try again.
Long Answer
When Sawmill runs as a CGI program, it includes images in its pages by creating them in a temporary folder in the web server folder, and then embedding links in the HTML so that the images it created are served by the web server. This is done by selecting a "temporary folder" and "temporary folder URL" which point to a folder inside the web server's root folder. They both point at the same folder, but one of them is the pathname of the folder, and one of them is the URL of the folder. These two must point at the same folder for images to appear in the pages generated by Sawmill in CGI mode. If images are not appearing, it is usually because this is set incorrectly. To correct the temporary folder, delete the preferences.cfg file in the LogAnalysisInfo folder, and access Sawmill. You will be prompted to enter the pathname and URL of the the temporary folder. Make sure you see the logo on the page after you enter the temporary folder -- if the logo does not appear, click your browser's Back button and try again until you see the logo. If the logo does not appear, no other images in the Sawmill interface will either.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Your log format does not include year information, so Sawmill has to guess the year. Use a different log
format if possible (one which includes year information). See the long answer for a way of manually setting the year for blocks of log data.
Long Answer
Most log formats include the year as part of the date on every line, but a few (in particular, Unix Syslog format) include only month and day. In this situation, Sawmill has no way of knowing which year a particular event occurred in, so it has to guess. Recent versions of Sawmill will always guess that the event occurred in the current year; previous versions may have a particular year hard-coded in the default_log_date_year option in the profile, and will put all events in that year. The best solution, if possible, is to use a different log format--use a log format that has year information. Then Sawmill will always categorize events in the correct year. If that's not an option, then you will need to help Sawmill to know which data belongs in which year. There are several options, but the easiest one, if you are using Unix Syslog format, is to rename your log files so they end in yyyy.log, where yyyy is the year the log data is from. If some logs span multiple years, you will need to split those logs into files which do not cross year boundaries. For instance, if you have mail.log which contains data from 2004, 2005, and 2006, you can split it into three files, mail_2004.log, mail_2005.log, and mail_2006.log. The Unix Syslog plug-in automatically recognizes filenames which end with yyyy.log, and uses that value as the year when no year is available in the log data. Another option, also for logs written by Unix Syslog, is available if the message part of each log line contains a full date, including year. For instance, some logging devices include "date=2006-02-01" in the log data, indicating the date of the event. In this case, even though the syslog format may not have the year, the device plug-in can extract the year from the message. This is usually a simple modification of the plug-in, but not all plug-ins have been modified to support this yet. If your log data contains year information in the message, but the reports show data from the log year, please contact support@flowerfire. com and we will add extraction of years from the message of your format (include a small sample of log data, as a compressed attachment). Another option is to put the data in folders by year; e.g. put all your 2005 data in a folder called /logs/2005, and all your 2006 log data in a folder called /logs/2006, and then process the data in stages using the following command lines: sawmill -p profilename -a bd log.source.0.pathname /logs/2005 log.processing. default_log_date_year 2005 sawmill -p profilename -a ud log.source.0.pathname log.processing.default_log_date_year 2006 The first command creates a database using all the data from 2005, using 2005 as the date. The second command processes all the data from 2005, adding it to the existing database, using 2006 as the date. The final result is that you have a database which has 2005 data in 2005 and 2006 data in 2006. From then on, you can update your database normally, and the new log data (from the most recent day) will be correctly categorized in the current year. If new data continues to be added in the wrong year, make sure that the default_log_date_year option is set to thisyear in your profile .cfg file (in LogAnalysisInfo/profiles), and in LogAnalysisInfo/default_profile.cfg.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Set the IIS CGI timeout to a high value, like 999999.
Long Answer
Microsoft Internet Information Server (IIS) automatically terminates CGI programs that run for more than five minutes. Unfortunately, Sawmill can easily use that much when building a database, and if IIS terminates it, it may leave the database partly built and unusable. The solution is to reconfigure the IIS server to increase the CGI timeout to a much larger value. Here's how (instructions are for Windows 2000 Server; other Windows variants may be slightly different): 1. 2. 3. 4. 5. 6. 7. 8. 9. In the Start Menu, go the Settings menu, and choose Control Panels. Open the Administrative Tools control panel. Open the Internet Services Manager item. Right-click on the computer icon in the left panel and choose Properties from the menu that appears. Click "Edit..." next to "WWW Services". Click the "Home Directory" tab. Click the "Profile..." button. Click the "Process Options" tab. Enter a large value in the CGI script timeout field, perhaps 999999.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
For security reasons, Sawmill requires an administrative username and password whenever you use it (otherwise, anyone could use it to access your computer, since Sawmill is normally accessible by anyone on your network). You choose this username and password when you first run Sawmill, and it asks you for it whenever you run it again. In version 7 we simply deleted users.cfg and prompted for a new root admin username and password. Though this is very insecure in a multi-user environment when the Root Admin deletes users.cfg but delays to enter a new username and password for hours or days. In such a case every other user who tried to access Sawmill would be prompted to enter a new root admin username and password and would gain root admin access when doing so. In version 8, as of 8.0.2, there is now a custom action, reset_root_admin. This is run from the command line like this: sawmill -a rra -u username -pw password This command changes the root username and password. This is even more secure than using a default/default users.cfg, because there is no longer even the possibility of an attacker repeatedly trying default/default in the hope of catching Sawmill between steps 2 and 4 of the original approach (below). The custom action approach also solves the problem of losing other users (and the root admin language), because nothing is changed in users.cfg other than the root admin username and password. This action exists only in 8.0.2 or later. For users with 8.0.0, and you forgot the username or password you originally chose, you can reset your password but you must contact Sawmill support and we will give you a file to be placed in lang_stats. directory. This will delete all users from Sawmill. Once you have the new users.cfg, access Sawmill again through a web browser, and you will be prompted to choose a new administrative username and password.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Loosen the permissions in the Preferences, or run your CGI programs as a different user, or run your
command line programs as the CGI user.
Long Answer
For security reasons, UNIX web servers often run CGI programs as a special user, often user nobody, or user web, or user cgi, or user apache. When you run Sawmill in CGI mode, it runs as this user, and any files it creates are owned by that user. This can cause problems if you later need to run Sawmill as a different user, for instance to run a command-line database update-- the files which were created as the CGI user will not be accessible to the non-CGI user, and you will get errors about Sawmill not being able to read or write certain files. There are several possible solutions to this problem: 1. You can run your command lines as the CGI user. This is often the easiest solution. Of your CGI user is user nobody, then use "su nobody" to change to user nobody, and then run your commands as that user. Since both the CGI version and the command-line version will be running as the same user, there will be no permissions issues. You may need to configure a password, shell, and home directory for user nobody before you can log in as that user, which will require root access. This option is slightly insecure because giving user "nobody" a home directory and a shell makes it a slightly more powerful user; if the purpose of using "nobody" as the CGI user was to run CGI programs with a powerless user, this circumvents that security somewhat. 2. You can run your CGI program as the command-line user. If your username is "myself", then you can reconfigure your web server to run CGI programs as that user, rather than the user it's using now. You may even be able to configure the server to run only Sawmill as that user, while continuing to run other programs with the usual CGI user. Because both the CGI version of Sawmill and the command line version will be running as user "myself", there will be no permissions issues. This may be difficult to configure, however; see your web server documentation for instructions on how to configure your server to run CGI programs as a different user. On some servers, this may not be possible. 3. You can change the permissions of the files that Sawmill creates, by editing the permissions options in the Preferences. This is usually an insecure solution, however, since you'll need to loosen many of the permissions to 777 (everyone can read, write, execute/search), which makes your files vulnerable to modification by unauthorized users on the machine. This option may be acceptable, however, if access to the machine is limited to authorized users; i.e. if the only ones who can log in by telnet, SSH, FTP, etc. are those who are trusted Sawmill administrators. Any one of these solutions will work; you do not need to do more than one of these.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: It depends on how much detail you ask for in the database. It uses very little if you use the default detail
levels.
Long Answer
Memory usage depends mostly on the complexity of your data set (not the size). If your database has fields with millions of unique values, it will use many megabytes for each of those fields. It's uncommon for any particular field to require more than 100M, but in extreme cases, fields can use over 1G. Disk usage is roughly proportional to the size of your uncompressed log data. As a general rule of thumb, Sawmill will use about as much disk space for its database as the uncompressed log data uses on disk. So if you're processing 500G of log data, you'll need about 500G of disk space to hold the database. The time to process a dataset is roughtly proportional to the size of a database. As of 2004, on a moderately fast single-CPU system, Sawmill typically processes between 5,000 and 10,000 lines of log data per second.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Because "visitors" is the number of unique visitors, a visitor who visits every day will show up as a single
visitor in each day's visitors count, but also as a single visitor for the whole month -- not 30 visitors! Therefore, simple summation of visitor numbers gives meaningless results.
Long Answer
We get this a lot as a bug report, but Sawmill is not counting visitors wrong. "Visitors" in Sawmill's terminology refers to unique visitors (see Hits, Visitors, etc.). So:
q
The total hits in a month is equal to the sum of the hits on the days of the month and
and the total bandwidth for a month is equal to the sum of the bandwidth on the days of the month and
and the total page views for a month is equal to the sum of the page views for each day of the month BUT
The total number of visitors in a month is not usually equal to the sum of the visitors on the days of the month.
Here's why. Suppose you have a web site where only one person ever visits it, but that person visits it every day. For every day of the month, you will have a single visitor. For the entire month, too, you will have a single visitor, because visitors are unique visitors, and there was only one visitor in the entire month, even though that visitor came back again and again. But in a 30-day month, the sum of the visitors per day will be 30, or one visitor per day. So though Sawmill will correctly report one visitor that month, it will also correctly report one visitor per day. If what you're really looking for is "visits" rather than "visitors" (so each visit will count once, even if it's the same visitor coming back over and over), then that's what Sawmill calls "sessions," and you can get information about them in the Sessions Summary and other session-related views (paths through the site, entry pages, exit pages, time spent per page). In table reports, the total row is calculated by summing all other rows. Because visitors cannot be summed in this way, the visitors column in the total row will always be a dash (-).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Your ISP may be regularly deleting or rotating your log data. Ask them to leave all your log data, or rotate
it over a longer interval. It's also possible that your log data does not contain those days for another reason.
Long Answer
To save disk space, many ISPs delete, or "rotate" (rename and/or compress) the server log data regularly. For instance, instead of letting the log file grow forever, they may rename it every day, start a new one, and compress the old one; then, every week, they may delete the logs older than seven days. In other, more dramatic cases, they may simply delete the log file every month or week, and restart a new one. Though this does save disk space on the server, it presents serious problems for log analysis. When you rebuild the database with Sawmill, it processes all the existing log data, and creates a new database from it. If some of the old log data has been deleted, that data will no longer be available in the statistics. So if the ISP deletes the logs every month, and you rebuild your database, your statistics will go back one month at the most. Similarly, when you update the database, Sawmill adds any new data in the existing log data to the database. So if the ISP deletes log files every month, and you only update your database every month on the 15th, then all the data from the 15th to the end of each month will be missing, because it was not added through an update, and it was deleted on the 1st of the month. The best solution is to convince your ISP to keep all of your log data, and never delete any of it. If you can do that, then there will be no problem-- you'll always be able to rebuild or update your database and get all of the statistics. Since this will require more of your ISPs disk space, however, they may not be willing to do this, especially if you have a very large site, or they may charge extra for the service. Of course, if you own and manage your own server, you can do this yourself. The second best solution, if you can't convince the ISP to keep all log data, is to store your back log files on your own system. If your ISP rotates the data through several logs before deleting the oldest one, this is easy-- just download the logs you don't have regularly (you may be able to automate this using an FTP client). If they only keep one copy, and delete it and restart it regularly, then you'll need to download that file as close to the reset time as possible, to get as much data as possible before it is deleted. This is not a reasonable way for ISPs to rotate logs, and you should try to convince them to rotate through several files before deleting the oldest one, but some of them do it this way anyway. You'll never get all of your log data if they use this technique-- the very last entries before deletion will always be lost-- but if you time it right you can get pretty close. Once you have the logs on your system, you can analyze that at your leisure, without worrying about them being deleted. In this situation, you'll probably want to run Sawmill on the system where you keep the back logs. If your log rotation is not the issue, then it may be that your log data does not contain the data for another reason. Maybe the server was down for a period, or the log data was lost in a disk outage, or it was corrupted. Look at the log data yourself, using a text editor, to make sure that it really does contain the days that you expected it to contain. If the data isn't in your logs, Sawmill cannot report statistics on it.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Sawmill includes referrer reports if the beginning of the log data includes referrers. If your log data starts
without referrers, and adds it later, you won't see referrer reports. Create a new profile from the latest log file (with referrers), and change the log source to include all log data.
Long Answer
When a profile is created, Sawmill looks at the first few lines of the log data when determining which fields are present, and which reports to generate. If it sees a referrer field there, it will create a Referrer report, and Search Engines and Search Phrases reports, and other referrer-related reports. This can be a problem if the log data does not contain referrer data at the beginning of the dataset. For instance, IIS often default to minimal logging (without referrers), and Apache often defaults to logging in Common Access Log Format (without referrers). If you later reconfigure the server to log referrers, Sawmill still won't know that, because the beginning of the log data does not contain referrers, and that's where it looks. So a profile created from the whole dataset will not report referrers, even though the later data contains referrer information. The solution is to recreate the profile, and when it asks you where the log data is, point it to the most recent file. That file will certainly have referrer information at the beginning, so the referrer reports will be set up properly. After creating the profile, and before viewing reports or rebuilding the database, go to the Config for the profile and change the Log Source to include all your log data. Then view reports, and referrer reports will be included.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Yes, it should do that, and it's not usually a problem. Any CPU-intensive program will do the same.
However, you can throttle it back if you need to, using operating system priorities.
Long Answer
Sawmill is a "CPU-bound" program while it's processing logs, which means that the microprocessor (a.k.a. CPU) is the bottleneck; the disk feeds data to Sawmill as fast as the processor can handle it. Most programs you use daily (web browsers, mail programs, word processors, etc.) are probably not CPU-bound, but any number-crunching or data-crunching program is. Other examples of programs that are typically CPU-bound include compression/decompression programs like ZIP, 3D rendering programs, and encryption programs (or encryption breakers). Any well-behaved operating system will give a CPU-bound process as much CPU as it has available, provided that the processing needs of all other processes are met as well. Because most systems use only a small fraction of their processing power, there is usually more than 90% free CPU available at any time. This CPU is wasted unless it is used, so if there's a program like Sawmill that's continually asking for more CPU, the operating system should and will give it as much CPU as possible. If nothing else is running on the system, Sawmill will use 100% of the CPU. Since nothing else needs the CPU, that's as it should be--if the operating system only gave Sawmill 50% of the CPU, it would take twice as long to process the log data, and during the other 50% of the time, the CPU would be sitting idle, wasting time. So don't worry if Sawmill is using nearly 100% of your CPU--that's the way it's supposed to be, and it will generally have no negative effects. The one time you may see negative effects of Sawmill's CPU usage is if there are other CPU-bound or CPU-intensive programs running on the system. In this case, because all the programs want as much CPU as possible, the operating system will split the CPU evenly between them. For instance if there are three CPU-intensive processes running, each of them will get 33% of the CPU, and each will run 1/3 as fast as it would on a lightly loaded machine. If you have an important CPU-intensive process running on your server (for instance, a very busy web server), you may want to give Sawmill a lower priority than the other processes. You can do this on UNIX systems using the "nice" command, and on Windows systems using the Process Manager. When you set Sawmill's priority to lower than the rest, it will get less than its share of CPU time, and the other processes will run faster. Sawmill, of course, will run slower. Similarly, if other processes are interfering with Sawmill's performance and you don't care about the performance of the other processes, you can increase Sawmill's priority to make it run faster, at the expense of the other processes. Even programs that are not normally CPU-bound will have moments when they become briefly CPU-bound. For instance, a web browser sits idle most of the time, using almost no CPU, but when you load a complex page, it briefly uses as much CPU as it can get to compute and display the page. During that period, if Sawmill is running, each program will get 50% of the CPU. So the layout will take twice as long as it does when Sawmill is not running, which will make the web browser feel more sluggish than usual. Other programs, and the operating system itself, will similarly feel more sluggish while Sawmill is processing the log data. This is an side effect of having a CPU-bound program running on the system--everything else will slow down. Setting Sawmill to a lower priority will help in this situation, because the web browser will get nearly 100% of the CPU (while Sawmill is temporarily halted) while it's rendering.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Run "executable -p profilename -a bd" from the command line window of your operating system.
Long Answer
It is not necessary to use the web interface to build a database; you can use the command line. This is useful for debugging problems with profiles, or for building when the web interface is not available, e.g. from scripts. The exact method, and the exact command, depends on the platform; see below. See also Additional Notes For All Platforms.
Windows
To build a database from the command line, first open a command prompt window. One method to open a command prompt window (sometimes called a DOS window) is to click "start" in the windows task bar then click "run", enter "cmd" in the text box and hit return. You will get a new window that will display something like this: Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\Documents and Settings\username> In the command prompt window you will need to move to the Sawmill installation directory using the "cd" command. Sawmill is installed by default to "C:\Program Files$PRODUCT_NAME 8", to move to this directory type cd C:\Program Files $PRODUCT_NAME 8 or whatever path you specified during installation. C:\Documents and Settings\username >cd C:\Program Files$PRODUCT_NAME 8\ C:\Program Files$PRODUCT_NAME 8> To get a list of internal profile names type the command "Sawmill.exe -a lp" at the command prompt. This will display a list of the internal profile names from which you can select the profile you want to build. C:\Program Files$PRODUCT_NAME 8>Sawmill.exe -a lp Sawmill 8.0.0; Copyright (c) 2008 Flowerfire myprofile To build you will run Sawmill with the "-p profilename -a bd" options. Replace profilename with the internal name of your profile from the list of internal profile names. The build command and related output are shown below. If you wanted to update your database you can run Sawmill with the -p profilename a ud options. C:\Program Files$PRODUCT_NAME 8>Sawmill.exe -p myprofile -a bd Sawmill 8.0.0; Copyright (c) 2008 Flowerfire Reading log file: C:\Apache [ ] 0.00% Reading log file: C:\Apache [] 3.16%
00:00 00:01
Reading log file: C:\Apache [######] 33.33% 5000e 00:02 Building cross-reference table 4 (worm) [############# ] 66.67% 00:03 Building cross-reference table 12 (search_engine) [##############= ] 73.68% 00:04 Building cross-reference table 18 (server_response) [####################] 100.00% 00:05
Mac OS X
To build a database from the command line, first open a terminal window. On Mac, you do this by selecting the Finder, navigating to the Applications folder, Utilities, and double clicking the Terminal application. You will get a new window that will display something like this: Last login: Mon Sep 1 10:46:44 on ttyp1 Welcome to Darwin! [host:~] user% In the terminal window you will need to move to the Sawmill installation directory using the "cd" command. Typically Sawmill is located in "/Applications/Sawmill". If you installed Sawmill somewhere else, change the directory name in the command to match. To move to this directory type "cd /Applications/Sawmill": [host:~] user% cd /Applications/Sawmill [host:/Applications/Sawmill] user% To get a list of internal profile names type the command "./sawmill -a lp" at the command prompt. This will display a list of the internal profile names from which you can select the profile you want to build. [host:/Applications/Sawmill] user% ./sawmill -a lp Sawmill 8.0.0; Copyright (c) 2008 Flowerfire myprofile To build you will run Sawmill with the "-p profilename -a bd" options. Replace profilename with the internal name of your profile from the list of internal profile names. The build command and related output are shown below. If you wanted to update your database you can run Sawmill with the -p profilename a ud options. [host:/Applications/Sawmill] user% ./sawmill -p myprofile -a bd Sawmill 8.0.0; Copyright (c) 2008 Flowerfire Reading log file: /logs/Apache [ ] 0.00% 00:00 Reading log file: /logs/Apache [] 3.16% 00:01 Reading log file: /logs/Apache [######] 33.33% 5000e 00:02 Building cross-reference table 4 (worm) [############# ] 66.67% 00:03 Building cross-reference table 12 (search_engine) [##############= ] 73.68% 00:04 Building cross-reference table 18 (server_response) [####################] 100.00% 00:05
Linux/UNIX
Follow the Mac OS X instructions, which are basically UNIX instructions (since Mac OS X is basically UNIX); change the directories to match the location where you installed Sawmill. The executable file usually ends with the version number on Linux/UNIX platforms, so you'll need to change references from "./sawmill" to "./sawmill-8.0.0" (or whatever the version is).
You can also use the -v option to get "verbose" output from the build. There are many -v options available, documented in the "Command-line output types" page of the technical manual ( http://www.sawmill.net/cgi-bin/sawmill8/docs/sawmill.cgi?dp +docs.option+webvars.option_name+command_line.verbose ). For very high detail (too slow for any significant build), add "-v egblpfdD" to the command line. If you add much debugging output, you may also want to add "| more" to the end of the command line to pipe the output to a pager, or to add "> out.txt" to the end of the command line to redirect the output to a file. For more examples of command-line usage, run Sawmill from the command line with the --help option.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
The Symantec Security Gateways plug-in is based on a text export of a binary data file on the SGS/SEF device. To use "remotelogfile8.exe" to extract the text log from the binary data: 1. Browse to "http://www.symantec.com/search/" 2. search for document "2004021815290054" To use the "flatten8" utility to extract the text log from the binary data: 1. Review page 102 of "Symantec Security Gateways - Reference Guide" - Version 8, this is an excerpt: Flatten utility The flatten8 utility is shipped on the included CD and lets you perform simple log file management from the command-line. The flatten8 utility reads in the log message information from the systems XML files, and then parses in real-time the binary log file, substituting the actual error text message for its binary counterpart. Most often, this utility is used to convert the binary log file to a more usable format for a third party utility, such as an ASCII text editor. This utility is also used to review the most recent messages, or directed to show just statistics messages.
usage: flatten8 [-h] [-r|-s|-D] [-f] [-u seconds] [-t n] [-x xmlpath] log file ...
Where: -h Print this message and exit. -r Only has an effect when -s is used. Do reverse lookups on IP addresses. -s Output stats only. -D Do not print out error information. -f Follow output. (Binary files, default interval 2 seconds). -u Follow update interval in seconds. (Implies -f). -t Tail the last 'n' log messages. -x Next argument specifies path to XML dictionary files. This argument should not need to be used, as the XML files are placed in the default location during installation.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: You can't track full URLs or HTTP domains, because PIX doesn't log them; but you can turn on DNS
lookup in the PIX or in Sawmill to report resolved hostnames.
Long Answer
The Cisco PIX log format can be configured to log hostnames as well as IPs; if it does, the PIX plug-in will report the hostnames. This is the preferred way to get hostname information from PIX. If that's not an option, Sawmill can be configured to look up IP addresses using the DNS Lookup section of the Config page. In this case, the IP address field value will be replaced by the resolved hostname, so this resolved hostname will appear in the IPs reports. PIX does not log URLs, however, so it is not possible for Sawmill to report domains accessed. PIX reports lines like this: Accessed URL 12.34.56.78:/some/file/test.html This shows the source IP, which we have from another line, and the URL stem, which is slightly useful, but it does not show the domain; and resolving the IP just gives the resolved hostname, not the domain from the URL. Still, it's better than nothing; resolving the hostname might give something like server156.microsoft.com, which at least tells you it's microsoft.com traffic, even if you can't tell whether it was mdsn.microsoft.com or www.microsoft.com. PIX can also be configured to log hostnames in the Accessed URL lines, which looks something like this: Accessed URL 12.34.56.78 (server156.microsoft.com):/some/file/test.html But this has the same problem; it shows the hostname, not the HTTP domain. It seems that the HTTP domain is not available from PIX log data. The reason we recommend doing DNS lookup in PIX rather than Sawmill are twofold: 1. DNS lookup after-the-fact may give a different hostname than it would have given at the time, and the one at the time is more accurate. 2. DNS lookup in Sawmill replaces the IP address with the hostname, so the IP is not available in the reports. DNS lookup in PIX *adds* the hostname as a separate field, so both are available in the reports.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Because there is not a current version of MySQL available for x64 MacOS, it is not possible to build or use MySQL databases on x64 MacOS with Sawmill. When a x64 MacOS version of MySQL becomes available (from the makers of MySQL), we will add support in Sawmill. For now, use the x86 version of Sawmill, which will run on x64 MacOS, and can use MySQL.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: Backup and restore the LogAnalysisInfo folder when no update or build is running, or for one profile. For
MySQL also backup and restore the MySQL database.
Long Answer
If you're using the internal database, you can back up the LogAnalysisInfo folder in your Sawmill installation folder, to back up the entire installation; and you can restore it to restore the entire installation. This will back up profiles, databases, users, preferences, scheduled tasks, and more. The backup and restore must occur when there is no database update or rebuild in progress; it is fine if there is a report generation in progress. ^M^M If you're using a MySQL database, you can do the backup/restore as described above, and you will also need to back up the MySQL database for each profile. By default, the MySQL database's name is the same as the internal name of the profile, but it can be overridden in Database Options, in the Config section of the profile. Consult the MySQL documentation for information on backing up and restoring a database. ^M^M To backup or restore a particular profile, backup or restore the profile file from LogAnalysisInfo/profiles, and the database folder from LogAnalysisInfo/Databases, and if you're using MySQL, backup and restore the MySQL database for the profile.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: An anti-virus or anti-malware software, which is actively scanning your Sawmill installation folder, can
cause this. Disable scanning of Sawmill's data folders, in the anti-virus product.
Long Answer
Some anti-virus software, and anti-malware software, actively scans the entire disk, looking for viruses or other malware. This sort of scanning can interfere with Sawmill's operation, if the software scans Sawmill's data files, or database. The antimalware software interferes in two ways: (1) it opens Sawmill's data files, and holds them open while Sawmill is trying to write to them, during database builds, which causes Windows to refuse Sawmill write access to its own internal database files, and (2) if the malware detects a virus signature in one of Sawmill's database files, it may delete or modify that file, corrupting the database. The second scenario can occur even if there is no actual virus present, because Sawmill's database files are binary files, which can potentially contain any possible virus signature due to random permutations of the data; and worse, because Sawmill is often used to scan web logs, mail logs, and even antivirus logs which naturally contain virus signatures of the viruses which were encountered by the logging devices or servers. Even when anti-virus scanning does not cause errors in Sawmill, it can greatly reduce the performance of Sawmill, as both fight for access to the same files. The performance impact can be 20 times or greater--a database which might normally takes 1 hour to build might take 20 hours or more. The solution is to disable scanning of Sawmill's directories. Anti-malware should not be completely turned off--it is important to the security of your system, but most products can be selectively disabled, so they will not scan particular folders. In a default installation, Sawmill is found in the Program Files folder of the C: drive, so disabling scanning of the Sawmill folder there will greatly improve the performance and reliability of Sawmill.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
There is an issue with final_step being used in a plug-in. It is a broad problem that can arise when final_step is used in a plugin, and relies on specific names or layout in the profile. Because final_step can contain arbitrary code, and can access or modify anything, it bypasses the structured layout of the rest of the profile, and is therefore potentially version specific, and cannot be automatically converted. While any standard v7 plug-in will work with v8, plug-ins that use final_step may not work, if they access structures which have been removed or renamed in version 8. This sort of plug-in will have to be manually modified to make it v8 compatible.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
Follow these directions from the main menu in Outlook 2003. Go to the Tools menu, select Options and then Security. In the Download Settings, click change automatic download settings, then uncheck the setting for don't download pictures or content automatically in html e-mail. Click OK to close the download images dialogue box and the options dialogue box and now view your email message. The report should look fine, with all of the headers and graphs lining up.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long Answer
A sawmill is a tool that processes logs (the kind made from trees), and so is Sawmill (it processes web server logs).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Short Answer: We ship new versions to provide our customers with the latest minor features and bug fixes quickly.
Sawmill is no buggier than any other software, and you don't need to download a new release unless you're having problems with the current one.
Long Answer
We've had a few people ask us why we ship new versions of Sawmill so often. The reason is that we want to provide our customers with access to the latest minor features (e.g. new log formats) and bug fixes. Our shipping process is highly automated, so it is relatively easy for us to ship a new version, so we do it frequently. There are bugs in Sawmill, just like there are bugs in all computer programs. Of course, we strive to keep the bugs to a minimum, but Sawmill is very complex software, and we get reports of a few new bugs every week. We roll these into new releases every couple weeks, and ship them so that new downloaders won't be troubled by these bugs, and people who are experiencing them will be able to get a fixed version. Other computer programs have similar numbers of bugs, but they package more bug fixes in each release, and release versions less frequently. Unless you're having problems with the version of Sawmill you're currently running, if you need a new feature we've added (like support for a new log format), there is no need to upgrade. You can upgrade at whatever pace you like, and skip any upgrades in the middle; each new release of Sawmill is a full release, so you don't have to have any previous version installed to use it.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
An Overview of Sawmill
User Guide Index Overview Sawmill reports on your web site activity. As a User of Sawmill you will have access to the following areas:
q q
An Overview of Sawmill
Reports
Filters
Sawmill creates reports for you on-demand, so when you want to look at a report, you are seeing the most up to date information Sawmill knows about. Sawmill can be configured to update the reports whenever you want (you need to contact your Sawmill Administrator to change this) but typically they are updated daily. Sawmill's reports are on-demand and are created when you click on the "Show Reports" link. This allows you to "drill down" into the reports by selecting only those parts of the data you are interested in. The process of selecting only parts of the data is called "Filtering" in Sawmill and you have a number of Filter options to choose from. To find out about how to use the Sawmill filters, see Filters Sawmill Filters can be used to break down the reports into manageable chunks, for example, to see all traffic for a given visitor or all traffic on only one of your domains. By breaking down the reports into different types of data you are able to use Sawmill to answer some of your questions. Examples of the types of questions you might have are: Who is coming to your site? - the domains they're coming from (or just the IP addresses) and if your visitors login then their login username as well. The things Sawmill can show about your visitors is limited to the IP address and domain, the username and the geographic location that the IP address or domain is registered in. If your users login to your web site and they have provided further information about themselves to you, then it might be possible to incorporate that information into the Sawmill reports, but this will need to be done by your Sawmill Administrator. Where are they coming from? - Sawmill can show you the geographic location of your visitors (by country and city). What are they looking at? - the sites, folders, and pages they're visiting, the images they're viewing, the scripts they're accessing and more. Sawmill can show you all the pages that have been viewed on your site, not just the top ten, use the Row Numbers section of the report to see more rows. You can also view the page they looked at by clicking on the icon next to each URL. When are they visiting? - which days got the most traffic, how your traffic is growing over time, which weekdays and hours are peak, and more. What are they doing? - where they enter the site (the Entry Pages report), which paths they take through the site (the Session Paths report) and where they leave (the Exit Pages report). How long do they stay? - which pages keep people the longest, and how long they stay on the site before leaving. Sawmill can show you how long visitors are looking at your web site and how long they look at individual pages on your site, use The Session Overview and The Session Views for this. Who brought them? - where they found the link that brought them to your site, whether it was a search engine (if it was, what they searched for in the search engine), an advertisement, a link on some other web page or if they came directly to your site. For more info, see the Referrers report. You can also visit the site that brought them by clicking on the icon next to each one. What are they using to browse? - which operating systems and web browser they're using. It is also possible to find out what the screen resolution and screen depth is of your visitors screens, this can be set up by your Web developer and your Sawmill Administrator. Which are links broken? - which pages that were requested on your site that are no longer available, or were never available. For more detail about how to interpret the reports, see Understanding the Reports.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Reports
User Guide Index An Overview of Sawmill Reports Filters Understanding the Reports The Reports present log file statistics in an attractive and easily navigable graphical format and comprise of the following components (for a clickable screenshot see the Report Interface):
The Report Header The header of the Reports page is a bar containing the following:
q q
Profiles A link to the Admin Screen, a list of all your profiles Logout link A link to log out of Sawmill
The Report Toolbar Below the header is a bar which contains the following:
q
The Calendar Click this to open the Calendar window, where you can select a single day, month, or year to use as the date/time filter. When you have selected an item in the Calendar, all reports will show only information from that time period, until the date/time filter is removed (by clicking "Show All" in the Calendar). The Date Range Selector Click this to open the Date Range window, where you can select a range of days to use as the date/time filter. When you have selected a range in the Date Range, all reports will show only information from that time period, until the date/time filter is removed (by clicking "Show All" in the Calendar). Filters Click this to open the Global Filter window, where you can create filters for any fields, in any combination. Filters created here dynamically affect all reports; once you have set a Global Filter, all reports will show only information for that section of the data. Global Filters remain in effect until you remove them in the Global Filter window or by unchecking and refreshing the reports filter checkbox in the upper right hand corner.
The Report Menu Below the Reports Tool Bar and to the left of the main window is the Reports Menu, which lets you select the report to view. Clicking a Report Group will expand or collapse that group; clicking a report view will change the report display to show that one. Clicking a report view will remove any Zoom Filters, but will not remove Global Filters or Date/Time Filters. The Report Menu includes the following Report Views:
q q q q
The Report The main portion of the window (lower right) is occupied by the report itself. This is a view of the data selected by the filters (Global Filters, Date/Time Filters, and Zoom Filters). This provides one breakdown of the data specified by the filters -- you can select another report in the Reports Menu (see above) to break down the same data in a different way. There are several parts of the report: The Report Bar At the top of the report is a bar containing the report label and the current global, date/time or zoom filters, if any. The Report Graph For some reports, there will be a graph above the table. The existence of this graph, its size, type (e.g. pie chart or bar or line), and other characteristics, varies from report to report, and can be changed by the Sawmill Administrator. The graph displays the same information as the table below it. The Report Table The Report Table contains the main information of the report. It displays one row per item, with the aggregated numerical values (e.g. sum of hits/page views etc) in columns next to it. It may also include columns showing a graph or a percentage (of the total traffic).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Filters
User Guide Index The Filters An Overview of Sawmill Reports Filters Understanding the Reports
There are many levels of filters available to you when viewing reports:
q q q
Global Filters These remain in effect until they are removed in the Filters page. Date/Time Filters These remain in effect until they are removed in the Calendar page. Zoom Filters These remain in effect until they are removed by clicking on another option in the navigation menu.
All these filters are combined when used together; i.e. an item is included if it is selected by the Filters AND by the Date/ Time Filters AND by the Zoom Filters. For instance, if the Filters show events during 1am-2am, and the Zoom Filters show events on January 1, then the table will show event from January 1, during 1am-2am. The filters that are applied to the report determine what statistics you see. The Filters let you "zoom in" on one part of your data (in a similar way to selecting a new view from The Report Menu). You can use the Filters to get information about a particular day, or a particular directory, or a particular domain, or more. If there are no filters in place, that means you are looking at your complete data; all available data is represented by the graphs and tables shown. If the The Report Bar shows that there are filters active, then you are not seeing your entire data; you are seeing only a portion of it. What portion you're looking at depends on the Filters. If, for example, the only filter is a /dir1/ filter on the page field, then the data displayed shows only those hits which were on /dir1/ or pages contained in / dir1/ (or in other directories contained in /dir1/, or pages in them, etc.). For instance, if you have 1000 hits on your site, and 500 of them were inside /dir1/, then if there are no filters active, you will see 1000 hits in the tables and graphs, or all the hits on your site. But if there is a filter /dir1/ on the page field, you will see 500 hits in the tables and graphs, or only those hits in /dir1/. The Filters are an extremely powerful way of getting detailed information about your site. If you want to know what day you got the most hits on /dir1/, you can do that by adding /dir1/ as a filter, and then changing to the "Years/months/days" view (see The Report Menu). With /dir1/ as a filter, you will see only those hits on /dir1/ (500 of them, in the example above), and you will see how those 500 hits break down by date and time. You can add an additional filter to the date/time field if you want to examine just the hits on /dir1/ on a particular day. This gives you almost infinite flexibility in how you want to examine your data.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
What does Sawmill measure? Where did my visitors come from? How Sawmill counts Visitors How Sawmill calculates Sessions How Sawmill calculates Durations Applying what you have learned to your web site
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Glossary
User Guide Index An Overview of Sawmill Reports Filters Understanding the Reports
A
Aggregate: A record of information from different sources, creates a sum of those sources. Authentication: To identify a user as authentic, with a username and password. HTTPS authentication as a feature of servers is used for sending security-sensitive documents, like payment transactions. Authenticated users: Verification that the user is who they say they are, or verifying their identity. Authorized users: Authorization is the verification that a user has the authority or permission to access a system or perform a certain operation.
B
Bandwidth: The measurement of the rate of data transferred, in kilobytes, over a device, network or website. Binary mode: A method of downloading a file that is encoded in a sequence that the operating system can execute. Bool or boolean value: The boolean value is a datatype with the values of one or zero, sometimes called true or false. It restricts the range of values allowed for an operation.
C
Comma-separated value (CSV): A data format that has fields separated by commas. Fields which contain CSV, must be enclosed by double quotes. Concatenation: The joining of two character strings, end to end. An operator will join the two strings together. Configuration group (.cfg): Grouped text files, called configuration files, where all of the options of Sawmill are stored.
D
DNS: A domain name server stores information associated with domain names, like the IP or email addresses for each domain.
E
e: The mathematical constant e is a unique real number, and is used in exponential functions.
F
Filters: Selectively include or eliminate portions of data. Filters can be applied to log data to affect how the log data is processed or in reports to select what statistics you see.
G
Gigabyte (GB): One billion bytes.
H
Hits: The number of events on a website, events can equate to elements which are images, PDFs or downloadable pages. HTTP: Hyper Text Transfer Protocol is a request or response protocol between clients or web browsers and servers. HTTPS: HTTP is to be used, but with an additional layer of encryption or authentication between HTTP and TCP.
I
Iterate: Repeated use of a procedure, applying it each time to the result of the previous application. IP: Internet Protocol used for computer networking on the Internet.
J
Java: Programming language deriving its syntax from C and C++.
K
K: In Unicode, the capital K is code for U+004B and the lower case k is U+006B.
L
Log fields: Holds log data that can be processed by filters and then copied to database fields. Log files: Text files that are generated by servers, recording each event or hit that happens.
M
Mapped drives: Refers to computers or drives that are on a network, recognized as part of a closed network or one that has network privileges. Megabyte (MB): One million bytes. Metrics: The measuring of web log traffic on a website.
N
Network shares: Distributed users across a network, similar to mapped drives. Node: A device that is connected and part of a computer network.
O
O: Big O notation describes the limiting behavior of a function for very small or very large arguments.
P
Page views: Refers to the request to load a single page of an Internet site. Process ID (PID): A unique number used by some operating kernels, such as UNIX or Windows, to identify a process.
Profile: Is a file that contains configuration settings for an individual or group of users.
Q
Query: As in the ask or query the database, used commonly with SQL.
R
Recursive uploads: Refers to the uploading of files using an FTP client that uploads all of the directories, subdirectories and files, in a sequential manner. Referrers: It is the URL of the previous webpage from which a link was clicked on.
S
Salang: A programming language specific to Sawmill. Secure Shell (SSH): Is a network protocol designed for logging into and executing commands on a networked computer. Sessions: A segment of time when a visitor visits a website, for a set amount of time. Subdomain: A domain that is part of the larger or main domain. It is separated by dots "." and they are read from left to right. An example is mail.domain.com, where mail is the subdomain of domain.
T
Target: As used in the file that you are selecting or focusing a behavior on, as in the "target" file.
U
User agent: Abbreviated, UA, as used in a string to identify the browser being used.
V
Visitors: Refers to the number of people who visited a website and they are identified by their unique IP address.
W
Worms: A self-propagating program, that sends copies of itself through a network. It takes up bandwidth and can harm the network that it attacks.
X,Y,Z
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Admin Screen
User Guide Index An Overview of Sawmill Reports Filters Understanding the Reports
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Report Interface
User Guide Index An Overview of Sawmill Reports Filters Understanding the Reports
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
q q
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Profiles
User Guide Index The Profiles The profile list on the Admin Screen is a list of your profiles. The Sawmill Root Administrator will give you access to your reports. An Overview of Sawmill Reports Filters Understanding the Reports
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Logout link
User Guide Index An Overview of Sawmill Reports Filters Understanding the Reports
Clicking the Logout link in the upper right hand corner of the browser window (right hand end of The Report Header) will completely log you out of Sawmill and you will need to log in again to see your profile(s). You do not need to log out to change the active profile, just go to the Admin Screen by clicking on the 'Profiles' link (next to the logout link).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
q q q q q
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The Date Picker window is where you can select a range of days to use as the date filter. You can select any range by clicking on the dates in from and to calendars and selecting from the drop down menus, then clicking apply, or use the "Set Max" button to select all available dates.
When you have selected a range in the Date Picker window, all reports will show only information from that time period, until the date filter is changed (by going into the Date Date Picker Selector again) or removed (by clicking 'Show All' in The Calendar).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Macros
User Guide Index Macros An Overview of Sawmill Reports Filters Understanding the Reports
The Macros icon is where you can create a new macro or manage the existing macros. To create a new macro, select that from the drop-down menu and you then you will be able to name and add actions.
Once you name the macro, then you can have it open the current report, apply the current date and apply filters. You can set all or only some of these three choices. When your settings are set and saved, then your macro becomes part of the drop-down menu, you simply select it, to run the macro. To make changes, select Manage Macros, and you can view your current macros or delete them.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The Date Range window is where you can select a range of days to use as the date filter. You can select any range by clicking on the dates in from and to calendars and selecting from the drop down menus, then clicking apply, or use the "Set Max" button to select all available dates.
When you have selected a range in the Date Range window, all reports will show only information from that time period, until the date filter is changed (by going into the Date Range Selector again) or removed (by clicking 'Show All' in The Calendar).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Printer Friendly
User Guide Index Printer Friendly An Overview of Sawmill Reports Filters Understanding the Reports
The Printer Friendly icon allows you to view your current report in a separate browser window, and it will appear as it will look once you print it. There are no menus or extra information, just the report that you are currently viewing.Once you print the report, then you can close that window.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Miscellaneous
User Guide Index Miscellaneous An Overview of Sawmill Reports Filters Understanding the Reports
The Miscellaneous icon drops down and you can choose to email your report, save changes, save as a new report, and also get the database information. If you want to email your report, then you fill in the fields for your email address and the recipients, also putting in a subject line.
If you select Save Report Changes, then this dialogue will pop-up:
Select the Save as New Report, if you want to rename the report, and save the report within a group, and with certain filters.
When you name the report, you also have the option of seeing it in the menu, your report will show up after the Log Detail selection. If you want to remove this report, select Customize Report in Config and you will see be able to select Delete, for your particular report. Once you save your changes, then your report will now longer be an item along the Report Menu bar. When you select the Database Info, you will see a summary for the database, including log entry information and whether you want to update or rebuild the database:
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill has many views, follow these links for more information (if you need help with other views contact your Sawmill Administrator):
q q q q
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The Calendar
User Guide Index The Calendar An Overview of Sawmill Reports Filters Understanding the Reports
The Calendar shows you the years, months, weeks and days during which there was traffic by displaying a clickable link for each day, week, month or year. All links that are not clickable have no data in the database. Clicking any day, week, month, or year adds Date/Time Filters to the reports for the selected period, thereby "zooming in" on that data.
Select a date in the calendar, the "Zoom Active" window will popup, and then select open a report to see specific data for that date.
The Calendar controls the date and time filtering in the report and once filtered, The Report Bar shows the time period that
the report is displaying. The screenshot below shows an Overview report filtered by one week, 12th to 18th April 1998.
The Date/Time Filters can be removed by clicking the 'Clear Date' link in the top right hand corner. The data for the entire period will be shown once again in the report. Each day, week, month or year in the calendar that has data in will be highlighted. The dates with no data are greyed out.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The Overview
User Guide Index Overview This view shows an overview of the traffic for the period indicated in The Report Bar. The traffic is broken down into hits, page views, spiders, worms, visitors, and many session events. For details of what each of these terms mean, see the section on interpretating reports of the User Guide Understanding the Reports. An Overview of Sawmill Reports Filters Understanding the Reports
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The Date and Time shows you individual reports for Years, Months, Days, Days of week, Hours of Day. Select the Date and Time menu and the list will appear for you to choose from. Here is a sample of a Years report:
If you select "Customize" then you can select the number of columns you wish to display for that particular date or time.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The Hit Types shows a report with the listing of hit types in order, with information about page views, spiders and errors to name a few. Here is what a Hit Type report will look like:
You can also customize the hit types report, select "Customize" and then choose the types of hits you want to report.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The Content allows you to select how you want to view the data, by pages/directories, pages alone, file types or the broken links. Here is a sample of a view by pages/directories:
Select "Customize" to choose which table columns you want to view. Here's a sample of what a File Type table might look like, so that you can see what files are getting the most hits:
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
If you select "Countries" then you will see the list of Countries statistics showing the number of hits per country:
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Screen dimensions and depths can be reported, if you create a filter for them. The most common use of this area, is to find out what browsers are being used and what operating systems.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
If you select the drop down arrow, you will see a list of reports that have to do with referrers. One of the most useful is the search phrases report, you can see what people word or word combinations people used to get to your site.
My site is the highest referrer, why? Sawmill reports every referrer for every hit on your site. So the Referrers table typically shows your web site as the highest referrer. For example, if someone had arrived at your site after clicking on the search results in Google, a Google page will be the referrer (the last page you are on is the referrer for the current page you are on). So, if your current page is a page on your site and the previous page is also a page on your site, then your site will be the referrer. As your visitors will visit more than one page on your site, your site will be the referrer far more than any other site. Sawmill can be set up to remove your site as the referrer by categorizing your site as an "internal referrer", contact your Sawmill Administrator to do this. How many people come directly to my site, rather than being referred? When visitors arrive at your site and do not have a referring page, Sawmill will categorize these as having "no referrer" and remove them from the reports, but this is quite a useful metric. If someone arrives without any referrer then it means they either clicked on a bookmark/email link or that they typed in your site address into the address bar of their browser directly; it suggests how many visitors to your site "know" about you. It is not always possible to get referrer information from the browser (some browsers can be configured to block this information) but the majority of browsers do allow it. NOTE: Each referrer can be a link to an active site, click the link to visit the site.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Other Reports
User Guide Index Other The Other section contains reports that are useful for tracking Worms, Spiders, Server domains and Server responses. If you have filters set for Worms and Spiders, you will see statistics for those hits in these two reports: An Overview of Sawmill Reports Filters Understanding the Reports
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
q q
q q
Session Paths:Shows all the sessions, and the pages they hit, in a single expandable view. Click the plus box and the list will show that the session started at a page, then where the visitors went from there. Paths through a Page:Lets you enter a single page on your site and see those pages that were visited prior to hitting that page, and those visited after. You can also lookup pages from your site. This gives you all routes to and from a given point in your site. Entry and Exit pages: These are the first and last pages of every session. Session Pages: These are every occurrence of each page in any session. They are calculated by tracking how long it was from the hit on one page to the hit on the next page in that session and tabulating the results for all pages giving the time spent per page. Exit pages are considered to have zero time spent per page. Session Users: This lists all the users on your site and the number and duration of their visits. Individual Sessions: This lists all the sessions on your site, by session ID, session user, events and beginning and ending dates and times.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
In some tables there are URL's (see the Referrers table for an example) and if these URL's are full (i.e. they contain the whole address like this - 'http://www.somedomain.com') each line can be clicked to veiew the site. Clicking the link will launch a new browser window, or new tab if you are using tabbed browsing, with that URL. Just above the table is a bar that contains several different controls:
q q
The Row Numbers detail (selected by default) can be used to change the starting row and the number of rows that are displayed in the table. There are also "paging" buttons that allow you to "page" through tables with multiple entries. The Export link can be used to export data from the table. The Customize link can be used to change which columns are visible, the sort order, and other aspects of the report. (NOTE: The Sort Order of any table can also be changed by clicking a the arrow next to the column name; click once to sort that column, or again to sort in reverse.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Row Numbers
User Guide Index Row Numbers An Overview of Sawmill Reports Filters Understanding the Reports
The row numbers section above the table from left to right shows you:
q q
The current row numbers displayed The paging function - The paging is done by selecting the next row set (<1-10 or 11-20>) or skipping to the start or end by clicking [...]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Export
User Guide Index The Export Table You can export data from Sawmill by clicking on the Export Table link. A window will open and you can download the report from there. An Overview of Sawmill Reports Filters Understanding the Reports
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Customize
User Guide Index Customize The Customize selection controls which elements of the statistics that are visible, how to graph them, table options including the number of rows, pivot table and other graph options. Within each report table, you can customize how you're seeing the information. Most elements are optional, and you can turn them on or off by selecting them from here. An Overview of Sawmill Reports Filters Understanding the Reports
Examples include the columns of tables (e.g. the hits and bandwidth rows), the special rows of tables (e.g. the total and averages rows), the pie charts, and more. The list of available elements changes depending on the view, the database structure, and other configuration options.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Global Filters
User Guide Index The Filters The Filters Editor lets you add, change, and remove filters from any field available in the database. To get to the Filter Editor, click on the Icon in The Report Toolbar. An Overview of Sawmill Reports Filters Understanding the Reports
To filter one of the report views, click the "New Item" link. You will see this: First, chose the filter type from the drop down menu:
Lastly, enter the Value, that the field contains. The other option in the Filter menu is to create a new group:
Name your group, in this case, we have typed in "general group". Click "OK" once you ahave named your group. You will see your new group as a folder icon, with the ability to add items, edit or delete it.
Within this group, then you can add items as needed, and it will be saved within this group. Once you have added to your group, save and apply your filters. Once you click, to save, you will be brought back to the Report page you were on, before you decided to work with filters. The Filters editor will save your filters even when they are not enabled, so that you can come back the next time you look at this profile and enable these filters again, without having to keep re-entering them. You can add multiple filters, multiple groups and enable some, none or all of them for any view. You can edit the current filters by clicking the 'edit' link and delete them with the 'delete' link
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Date/Time Filters
User Guide Index Date/Time Filters Date/Time filters are controlled by The Calendar and The Date Range Selector. They remain active on all Reports until they are removed by Clicking 'Show All' from within the Calendar. An Overview of Sawmill Reports Filters Understanding the Reports
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Zoom Filters
User Guide Index Zoom Filters Zoom Filters are activated by having a report open and then selecting the magnifying glass next to the type list. You will see this dialogue box and then you select the fields that you want to zoom in on: An Overview of Sawmill Reports Filters Understanding the Reports
Once you select the items to be zoomed, then select the report again, and you will only see the zoomed list.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
An Overview of Sawmill
Reports
Filters
The total hits in a month is equal to the sum of the hits on the days of the month the total bandwidth for a month is equal to the sum of the bandwidth on the days of the month the total page views for a month is equal to the sum of the page views for each day of the month BUT
The total number of visitors in a month is not usually equal to the sum of the visitors on the days of the month
Here's why. Suppose you have a web site where only one person ever visits it, but that person visits it every day. For every day of the month, you will have a single visitor but for the month you will have a single visitor as well, because visitors are unique visitors, and there was only one visitor in the entire month. If what you're really looking for is "visits" rather than "visitors" (so each visit will count once, even if it's the same visitor coming back over and over), then that's what Sawmill calls "sessions," and you can get information about them in The Session Overview and How Sawmill calculates Sessions.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
11/04/2005 16:12 - arrive at homepage and navigate to a news item 11/04/2005 16:13 - read news item 1 11/04/2005 16:20 - navigate to next news item 11/04/2005 16:22 - read news item 2 11/04/2005 16:30 - exit (close down browser - no further activity for 30 mins)
With the session timeout interval of 30 minutes (this can be changed by the Sawmill administrator), Sawmill will count 1 session with a 10 minute duration, with the 2 news item pages having durations of 7 & 8 minutes respectively. This calculation shows the problem of duration counting in all log analysis software. Since there is no reporting of the exact time the user left the site (there is no log entry for the 16.30 'exit' as closing the browser does not leave any record in the server logs), we have no way of knowing when the user exited, so we count from the last known time which in this case is the start of reading the second news item, 16:22.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
most important? Maybe you should reconsider your priorities, and work on the content people want. Do we have enough bandwidth, and a fast enough server? Sawmill can show you the total hits and bandwidth usage of your sites, so you can decide when it's time to upgrade your connection or buy a new server. By upgrading your connection or server before it becomes a problem, you can keep your visitors happy.
Sawmill Can Help You Solve Important Problems. Here are some scenarios where Sawmill can help:
You see in the statistics that numerous hits came from Spain yesterday. Want to know what they were looking at? Just click on the Geographic Locations view and then click on Spain in that table, this 'Zooms' you in on hits from Spain. From The Zoom To Menu switch to the "Pages" report, and you see all the pages hit by people from Spain, in two clicks. You're trying to see if a particular page got a hit yesterday. The statistics show the top twenty pages, but you want to see all the pages; the page you're looking for only got one hit, or none, so it'll be way down in the ranking. You can do one of three things:
q
Choose the Row Numbers feature to 'page' to the last row in the table and 'page' back through the table until you find it. Click on 'All Rows' from the 'number of rows' drop down menu and the table will expand to show all the pages that ever got any hits, even if there are 500,000 of them.
You want to see which paths visitors took through the site, click-by-click. At each page along the way you want to see which visitors left, and where the other ones went next. Sawmill's "session paths" report lets you see the complete click-by-click details of every visit. Starting with the entry pages, you can click to expand any part of the data, to see whatever you need to see about your visitors and the paths they took. You want to see a graph of the web site traffic for last November. You want to see the bandwidth for November. Actually you just want to see November 16th. Wow, look at that spike around 3pm; where did all those hits come from? Sawmill can do all this and much more. This is how you'll think when you use Sawmill. Look at the November hits graph, wonder about the bandwidth; choose "bandwidth bar graph" from the menu. You have another graph showing bandwidth where there used to be just a hits graph. See a lot of hits on the 16th? Click the November 16 bar in the graph and the graph changes to show only traffic on November 16, hour-by-hour. Wondering where that spike came from on some particular hour? Three clicks and you can see the traffic for that hour only, broken down by domain, so you can see who it was. Or broken down by page, so you can see what they looked at. Or broken down by referring URL or domain, so you can see which ad, search engine, or web page brought you all that traffic!
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This is the pathname of an internal database on disk. If this option is blank, Sawmill will store the database in a folder with the same name as the profile, in the Databases folder in the LogAnalysisInfo folder. Information from log files is stored in this database, and when reports are generated, they are generated using the information in the database. See Databases.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This is a regular expression which specifies the log format. You only need to change this option if you are creating your own log file format. It is a regular expression (see Regular Expressions) which matches an entire log entry (one line of the log), and contains a parenthesized substring for each log field. For each line that matches the regular expression, the part of the line which matches the first substring will become log field 1, the part of the line matching the second substring will be log field 2, etc. Log lines which do not match this expression will be rejected. For example, consider the following regular expression: ^([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)$ Each parenthesized part matches any sequence not containing a space, and the four parts must be separated by spaces. The leading ^ and trailing $ ensure that the whole line must be matched. This expression matches a line which contains four fields, separated by spaces, where the fields do not contain spaces. The first field will be put into the first log field, the second into the second log field, the third into the third log field, and the fourth into the fourth log field. Some log formats cannot be processed using a single regular expression, either because they have peculiar field layout, or because their fields span several lines, or for some other reason. Usually it is still possible to process the log files with Sawmill, but the log format description file will need to include log parsing filters with more complicated parsing logic. For one example of that, see the Raptor file in the log_formats subfolder of the LogAnalysisInfo folder.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the number of hours to add to the dates in the log file. A positive value causes that many hours to be added to every date in the log as it is read, and a negative value causes hours to be subtracted. For instance, if your log data is in GMT (as some formats are, including some W3C-based formats) but your time zone is GMT-8 (i.e. Pacific Standard Time), then you should enter -8 here. A value of zero leaves the dates unchanged. Fractional hours are allowed; e.g., 9.5.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the filters to use when generating a report; i.e., it filters out all data not matching this expression, so only part of the data is reported. The value of this option is an expression using a subset of the Salang: The Sawmill Language syntax. View report filters (ToDo, add report filters link).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a profile which is to be used for the current command-line command. This is typically the first option on any command line that deals with a particular profile, e.g., you might use '-p myconfig -a bd' to rebuild a database for profile myconfig. More generally, this can be used in conjunction with Action and other options to build, update or expire databases from the command line, or to generate HTML files. In CGI or web server mode, this is used internally to manage profiles, and should generally not be changed. In addition to listing a single profile name explicitly, you can also use wildcards for this option, to perform the action for multiple profiles, one after another. To do this use pattern: followed by a wildcard expression, as the value of this (-p) option. For instance, this: sawmill -p "pattern:xyz*" -a bd will rebuild the databases of all profiles whose internal names (not labels) start with xyz. You can get a list of all internal names using the "-a lp" command line option. If this option is a full pathname of an existing file, that file is read as a profile file; otherwise, Sawmill treats it as the name of a profile in the profiles subfolder of the LogAnalysisInfo folder. If that doesn't exist either, Sawmill scans all profiles in that directory to see if the label of any profile matches the specified value, and uses that profile if it matches. See Configuration Files.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the command-line action to perform with the specified profile. The HTML interface takes care of setting this option for you as necessary, but you will need to set it manually when using the command line interface. Possible modes are:
q
build_database (or bd): This builds or rebuilds the database from the log data, erasing any data already in the database. update_database (or ud): This adds the log data to the database, while also leaving any existing data in the database. process_logs (or pl): This processes all the log data in the log source, and generates comma-separated output for each line accepted by the filters. It does not modify or create a database. This is useful for converting log data to CSV. remove_database_data (or rdd): This expires all data from the database which is in the filter set specified by Report Filters. (Note: the Date filter option will not work with this action; use Report Filters instead). rebuild_cross_reference_tables (or rcrt): This rebuilds the cross-reference tables of the database from the main table (without processing any log data). It is much faster than rebuilding the database. It can be useful if you have modified the cross-reference table settings and want to update the cross-reference tables to reflect the new settings, but don't want to rebuild the database. rebuild_database_indices (or rdi): This rebuilds the indices of the main table. rebuild_database_hierarchies (or rdh): This rebuilds the hierarchy tables of the database. update_database_hierarchy_tables (or udht): This updates the hierarchy table for the database field specified by Report Filters. merge_database (or md): This merges the contents of a database (specified with Merge database directory) into the current database. After it completes, the current profile's database will contain all the information it contained prior to the merge, plus the information in the added database. export_database (or ed): This exports the contents of a database to the folder specified by (specified with Directory). import_database (or id): This imports the contents of a database from the directory specified by (specified with Directory). The folder must have been created with an export_database action. generate_all_report_files (or garf): This generates HTML statistics pages for all reports, and the associated images, into the folder specified by Generate HTML report files to folder. The files and images are linked properly, so the HTML can be browsed directly from the resulting folder. This allows statistics to be browsed "off-line," without having to run Sawmill to generate each page. generate_report_files (or grf): This generates HTML statistics pages for a particular report (specified by Report name), and the associated images, into the folder specified by Generate HTML report files to folder (for HTML export) or the pathname specified by Output filename (for PDF export). The format of the export is specified by Output format type. The files and images are linked properly, so the HTML can be browsed directly from the resulting folder. This allows one report to be browsed "off-line," without having to run Sawmill to generate each page. send_report_by_email (or srbe): This sends a statistical report using HTML email. The report is sent to Recipient
q q q
q q
address with return address Return address using SMTP server. The report to send is specified by Report name.
q
export_csv_table (or ect): This exports a report table as CSV text. The report to export is specified by Report name, and is written to the standard output stream, so this is useful only in command-line mode. See also Output filename, Ending row, Export total row, Export maximum row, Export minimum row, Export average row, End of line, Sort by, and Sort direction. dump_main_table (or dmt): This dumps a tab-separated version of the "main" database table, to the standard output stream. It is affected by the Report Filters option: if no filter is specified, it will dump the entire main table; otherwise, it will dump only those rows matching the filter. This is much faster and more memory-efficient than exporting the Log Detail report. print_values (or pv): This displays (to the command line console) the numerical field values for a particular filter set. print_subitems (or ps): This displays (to the command line console) the subitem hierarchy for the database field specified with -fn option. print_items (or pi): This displays (to the command line console) all item values for the database field specified with -fn option. list_profiles (or lp): This displays (to the command line console) a list of the internal names of all profiles. These names can be used for command-line options that call for profile names. list_reports (or lr): This displays (to the command line console) a list of the report in the specified profile (specified with -p profilename). These names can be used for command-line options that call for report names (like -rn). list_log_fields (or llf): This displays (to the command line console) a list of the internal names of the log fields in the specified profile (specified with -p profilename). These names can be used for log filters. list_database_fields (or ldf): This displays (to the command line console) a list of the internal names of the database fields in the specified profile (specified with -p profilename). These names can be used for report filters. print_database_statistics (or pds): This displays statistics on the database for a profile (specified with -p profilename). It is useful for tuning and debugging memory and disk usage. execute_sql_query (or esq): This executes a SQL query against the database of the current profile (specified with -p profilename). The query is specified with SQL query. The resulting table, if any, is displayed to console. convert_70_database (or c70d): This updates an existing MySQL database created by Sawmill 7.0 to use the new layout used by version 7.1. This is required if you want to continue to use your existing MySQL database after upgrading to Sawmill 7.1 or later. It applies only to MySQL databases; no conversion is required for internal databases. convert_database_7_to_8 (or cd7t8): This converts the current database from version 7 format to version 8. recreate_profile (or rp): This recreates profile (specified with -p profilename). It deletes the profile, and recreates it using the options originally used to create it. This destroys any changes to the profile which have been made since it was created; e.g., if new log filters were added manually, they will be deleted. This rewinds the profile to the state it was in just after it was created. This also incorporates any change to the log format plug-in into the profile, so this is very useful during log format plug-in authoring, to repeatedly create the profile until the plug-in is working, without having to go through the web interface each time. update_to_version (or utv): This updates the Sawmill installation to a newer version (version number is specified with Update to version. This is new and highly experimental. For maximum safety, use the existing downloads page to download new versions instead of using this feature. If you do use this, back up your LogAnalysisInfo folder first! start_parsing_server (or sps): This starts a parsing server, to handle distributed parsing requests, listening on IP Parsing server hostname and port Parsing server port. See Distributed Parsing.
q q
Plug-in Actions
The following actions are also available, and are defined through plug-ins in the "actions" folder of LogAnalysisInfo.
q q
clear_report_cache (or crc) (Clear Report Cache) convert_version_7_profile (or cv7p) (Convert v7 Profile). This converts the profile in the pathname specified by pp from a v7 profile to a v8 profile, putting the result in the current v8 profiles directory. Parameters:
r r r
profile pathname (profile_pathname) (-pp): the pathname of the v7 profile .cfg file import database (import_database) (-id): true to import the v7 database; false to not import the v7 database database directory (database_directory) (-dd): the directory of the database (for importing, if id is true)
destination database directory (destination_database_directory) (-ddd): the directory of the version 8 database to create (omit for default location) destination profile name (destination_profile_name) (-dpn): the name of the version 8 profile (omit to use the name of the v7 profile) database directory (sql_database_name_convert) (-sdnc): the directory of the database (for importing, if id is true)
exp (-e):
q q
print_database_info (or pdi) (Print Database Info) print_task_info (or pti) (Print Database Info). Parameters:
r r r
task_node (-tn):
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a local folder where Sawmill can store profiles, databases, preferences, and other information. This folder must exist and be writable by Sawmill, or must be in a folder which is writable by Sawmill (so Sawmill can create it). If this option is empty, Sawmill assumes that the folder is named LogAnalysisInfo, and is found in the same folder as Sawmill. If a file named LogAnalysisInfoDirLoc exists in the same folder as Sawmill, the contents of that file are used as the pathname of this folder, and this option is ignored. If the environment variable LOGANALYSISINFODIR is set, its value is used instead, and this option is ignored.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This is an internal option used to track sessions in the graphical interface.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is used when Action is generate_report_files or generate_all_report_files (in command line usage). Sawmill generates statistics pages into this folder. This option determines what folder the files are generated in.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the name of a database field. It is used in a variety of command-line contexts where a database field name is required, including print_values, print_subitems, and print_items (see Action).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a cross-reference table number, or "all" for all cross-reference tables. It is used in a variety of command-line contexts where a cross-reference table is specified, including update_cross_reference_table (see Action).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is used on the command line in conjunction with the update_cross_reference_table action (see Action). When the value of this option is true, the table is rebuild from scratch; when it is false, it is updated incrementally from whatever is new in the database since last update.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether Sawmill starts its built-in web server. When this option is checked (true), Sawmill starts a web server on the IP address specified by Web server IP address, and port specified by Web server port, unless it detects that it is running as a CGI program under another web server, in which case it responds as a CGI program instead, and does not start the server. When this option is unchecked (false), Sawmill never starts the web server, unless it is run from the command line with no parameters, or it is run as a GUI program under MacOS or Windows.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This generates a report, given the report ID. This is used internally to handle the generation of HTML reports.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the language to use to generate the report, e.g., english. Available languages the names of the subdirectories of the languages folder of the LogAnalysisInfo folder.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option lets you use a direct URL to your statistics even if they are password-protected. Just include this option in the URL (clp+password) and Sawmill will treat it as if you had entered the password in the prompt field, and will bypass the prompt field and take you to the statistics.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the types of debugging output generated during a command-line action. This option is a sequence of letters, each representing a particular type of command-line output. If the letter corresponding to a type is present in the sequence, that type of output will be generated; if it is not present, that type of output will not be generated. The types, and their corresponding letters, are:
q q q q q q q q q q q q q q q q q q q q q q q q
e: Error message output. g: Generate Sawmill logo (banner) output. b: Built-in web server basic output. P: Progress indicator (command line and web). w: Built-in web server debugging output. f: Filter debugging output. p: Log parsing debugging output. i: Database I/O debugging output. d: Database access debugging output. D: Detailed database access debugging output. s: Statistics generation debugging output. l: Log reading debugging output. a: Administrative debugging output. m: Language module debugging output. n: DNS debugging output. N: Detailed DNS debugging output. v: Warning when a line is processed and no entries are accepted. t: Network debugging output. T: Detailed network debugging output. q: SQL query debugging. Q: Detailed SQL query debugging. o: Add a timestamp to every output line. c: Log information about inter-process communications. u: Add the process ID (PID) to each line, in curly brackets.
For instance, a value of ew will show only error messages and basic web server output. A value of egbPwfpidDslamnNvtqQocu will show all possible output. In CGI mode or web server mode, the output will be sent to a file in the Output folder of the LogAnalysisInfo folder; the file will be named Output-profilename, where profilename is the name of the profile. In command line mode (on UNIX and Windows), the output will be sent to the standard output stream.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This is the process ID (pid) of the main web server process. This is used internally to recognize when the main process exits, so the subordinate process knows to exit too.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This tells Sawmill which page to display in its HTML interface. This is generally used internally to deliver pages of the user interface to the browser.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the name of a report to generate. This is used as a parameter in various command-line actions, including export_csv_table, email_report, and generate_report_files (see Action).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the recipient address for an email message. It is used when sending a report by email with send_report_by_email (see Action).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the email subject to use when sending a report in an email message. It is used when sending a report by email with send_report_by_email (see Action).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the database directory for a database to merge into the current database. This is used together with "-a md" to add the contents of a second database to the current database. The second database must have the exact same structure as the first-- the easiest way to ensure that is to use the same profile file to build both.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the directory which holds a database export. It is used when exporting to specify the destination of the export, and when importing to specify the source of the import.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the destination directory of a database conversion (version 7 to 8).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the date filter to use when generating a report, it filters the report for the given date. See Using Date Filters for more details.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is used in combination with the "Date filter". If the option is specified then the generated report will be zoomed to the date which is specified in "Date filter".
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the version number to update to, using the web update feature. This feature is still experimental. When used in conjunction with -a ui, this provides a command-line method of updating an existing Sawmill installation to a newer version.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is used when Action is generate_report_files or generate_all_report_files (in command line usage). Sawmill generates statistics pages in a PDF friendly format by omitting the frameset, adding a table of contents page and by modifying specific style parameters.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option controls the ending row of a generated or exported report. Ending row is a global setting and overrides existing ending row values in every report and report element.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the field to sort by, when it is used with Action export_csv_table.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies direction to sort in (ascending or descending), when it is used with Action export_csv_table.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option adds an average row to the exported data-set when it is used with Action export_csv_table.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option adds a minimum row to the exported data-set when it is used with Action export_csv_table.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option adds a maximum row to the exported data-set when it is used with Action export_csv_table.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option adds a total row to the exported data-set when it is used with Action export_csv_table.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option exports a report to the specified filename when it is used with Action export_csv_table.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the format of an exported report. A report generated with Action generate_report_files (grf) will be exported in HTML format if this option is empty or "html", or in PDF format if this option is "pdf".
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the end of line marker in a CSV file when it is used with Action export_csv_table.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the SQL query to run. It is used together with "-a execute_sql_query" (see Action) to run a SQL query.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies whether to resolve itemnums when displaying a command-line SQL query result. This is used together with "-a execute_sql_query" (see Action). If this option is true, itemnums are resolved, and the string values are displayed on console. If this option is false, itemnums are not resolved, and the raw itemnums are displayed on console.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the hostname or IP address to bind to, when running as a parsing server. This is used together with "-a start_parsing_server" (see Action); when Sawmill is started with that -a option, it will bind to this IP and the port specified by Parsing server port, to listen for parsing server connections. See Distributed Parsing.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the port to bind to, when running as a parsing server. This is used together with "-a start_parsing_server" (see Action); when Sawmill is started with that -a option, it will bind to port and the IP specified by Parsing server hostname, to listen for parsing server connections. See Distributed Parsing.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this is true (checked), Sawmill will never attempt to look up hostnames from IP numbers; it will use IP numbers for everything. When this is false (unchecked), it will attempt to look up the local hostname when it starts a web server, and it will attempt to look up the hostname of any host which accesses it by HTTP, and it will look up the hostname of any host it encounters in the logs (if Look up IP numbers using domain nameserver (DNS) is true). This option is useful if there is no local Domain Name Server (for instance, if the computer running Sawmill is not connected to a network and is not itself running a DNS).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this is true (checked), Sawmill will look up the hostnames of IP numbers using DNS only when they appear in a log file and Look up IP numbers using domain nameserver (DNS) is on. When this is false (unchecked), Sawmill will still look up numbers in log files, but will also look up the hostname of the computer Sawmill is running on, and the hostnames of computers using Sawmill through web browsers. This option is useful because when it is true, Sawmill will never do any network access, so it can be run on a computer with a dial-up connection without having to be dialed in. When this option is false, Sawmill will perform a DNS lookup when it first starts and when other computers access it, so it will have to be permanently connected to the Internet (or using a DNS server on your local network).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the URL that Sawmill sends you to when you log out of Sawmill. If this option is blank, it will send you to the Sawmill login screen.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option controls the amount of time, in seconds, Sawmill keeps temporary files before deleting them. Temporary files include temporary profiles (used to browse statistics) and temporary images (used to embed images in statistics pages). Setting this to a high number will ensure that temporary images are around as long as they are needed, but will use more disk space.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the language module to use. Language modules contain rules for translating from Sawmill's internal text variables to what actually appears in generated HTML pages and other output. Language modules are contained in the languages subfolder of the LogAnalysisInfo folder, in a folder named after the language (for instance, English modules are in a folder called "english"). Modules are in several pieces: 1. lang_stats.cfg: The text of statistics pages 2. lang_options.cfg: The text of the option names and descriptions. 3. lang_admin.cfg: The text of the administrative pages. 4. lang_messages.cfg: The text of error messages and other messages. The module is split into pieces to allow for partial implementation. For instance, by implementing only the small lang_stats module, you can provide support for a particular language for statistics browsing, without having to spend a considerable amount of time to fully translate the entire Sawmill interface.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the HTML charset, e.g. UTF-8, to use when displaying pages in the web interface.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option controls whether the Professional/Enterprise switch is shown during the trial period. This is useful for hiding the Professional/Enterprise switch if only the Professional or Enterprise features are available during the trial period.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option controls whether Sawmill sends information to Flowerfire (maker of Sawmill) about the log formats being used. When this option is enabled, Sawmill sends information to Flowerfire (via a port 80 connection to www.sawmill.net). The information sent is only the name of the log format plug-in used in the profile; it does not send your log data, or any other information other than the name of the log format. We will use this information to help us focus our development on the most commonly-used log formats. If this option is off, Sawmill will never send any information to Flowerfire.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the SMTP server to use to send an email message. It is used when sending a report by email with send_report_by_email (see Action).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the SMTP username to use when authenticating with the SMTP server to send an email message. It is used when sending a report by email with send_report_by_email (see Action).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the SMTP password to use when authenticating with the SMTP server to send an email message. It is used when sending a report by email with send_report_by_email (see Action).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the email address where bug reports should be sent when a Sawmill user clicks the "Report It" button on an error message. The gloabl support email address is valid for all profiles. It is also possible to define separate support email addresses per profile in in Config/Miscellaneous. A support email address defined per profile overrides this support email address. If this option and the per profile support email address is blank, it will go to the software vendor's support address (support@flowerfire.com). That's fine for some situations, especially if the reporting user is the Sawmill administrator, but for ISPs and other multi-client and multi-user installations, most of the errors will be configuration issues that the software vendor can't do anything about, and that the reporting user can't fix (because they don't have administrative access). For multi-client licensing setups, this should be set to the email address of the Sawmill administrator, who can fix the problems as they occur.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the address or addresses Sawmill should send email to whenever an action occurs, for instance when the database finishes rebuilding, updating, expiring, or when HTML files are done being generated. The global actions email address applies to actions of all profiles. It is also possible to define separate actions email addresses per profile in Config/Miscellaneous. An actions email address defined per profile overrides this actions email address. If this option or the per profile actions email address is non-empty, Sawmill will send a brief description of what it just finished doing, using the SMTP server specified by SMTP server. If this option and the per profile actions email address is empty, Sawmill will not send email. Multiple recipients may be specified with commas, e.g., "user1@mydomain.com,user2@mydomain.com,user3@mydomain. com".
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
The return address when email is send by Global actions email address or Actions email address(es). The global actions return address applies to all profiles. It is also possible to define an action return address per profile in Config/Miscellaneous. If no actions return address is defined then Sawmill uses the actions email address as return address.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the number of seconds a Sawmill web access session (i.e., a session by someone accessing Sawmill, e. g., viewing reports or administrating Sawmill) can be inactive before it is timed out. When a page is loaded in the Sawmill web interface, it checks when the past access occurred, and if it was more than this amount of time, it considers the session to have timed out, deletes all information about the session, and requires a new login. This is a security feature, intended to prevent sessions from being available for arbitrary amounts of time on shared or otherwise insecure computers. Setting this to 0 turns this off, and sessions will never time out.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill provides a number of security features to prevent unauthorized access to your profiles, or to your system. The Security Mode is one of these; see also Security. The security mode cannot be set from the web GUI-- it can only be set by modifying the Sawmill preferences.cfg file (in the LogAnalysisInfo folder) with a text editor. The security mode may be one of the following:
q
browse_and_modify. This is the default. This mode allows web users to create new profiles, and modify existing profiles. It provides the full power of the Sawmill web interface from any web browser. It relies on the Sawmill password for security; users who have the password can create profiles, and modify existing profiles. Users who do not have the password can make temporary modifications, during browsing, to existing profiles, but they cannot modify the secure options. Secure options are those which cause files on the server to be read or written in any way; examples include Header file. browse_only. This mode adds an additional layer of security beyond what is provided by the password, by preventing users from creating or modifying profiles, even if they know the password. It allows the user to browse existing profiles, and nothing more. In this mode, profile options can be modified by directly modifying the Configuration Files, or by running another installation of Sawmill in Browse and Modify mode, and copying the profile.
Either of the options are secure enough to protect your system from malicious users, because all require the password before any profiles may be created or modified, and before any secure options may be changed (changes to the non-secure options cannot harm your system). If you are highly concerned about security, you may want to set the security mode to Browse Only, to prevent even password-equipped users from doing any damage.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This is a list of the hostnames of computers which are trusted. Hostnames should be separated from each other by spaces. Any browsing host which contains any of the listed hostnames as part of its hostname will be trusted, so entire subdomains can be trusted by entering the domain. Example: trusted.host.com 206.221.233.20 .trusteddomain.edu Browsers from these hosts will not be required to enter any passwords-- they will be automatically validated. Use this option with caution--it simplifies the use of Sawmill by eliminating all password screens for the administrative host, but can potentially be a security hole, if someone uses or spoofs the administrative machine without permission. If you are connecting from a trusted host, it may be difficult to remove that trusted host using the web interface, because Sawmill will refuse to allow you administrative access to change the trusted host, because your host will no longer be trusted. One solution to this is to modify the preferences.cfg file (in the LogAnalysisInfo folder) manually, with a text editor, to remove the trusted host. Another solution is to connect from another system, log in normally, and remove the trusted host that way.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether Sawmill displays the full operating system version details in error message. It is useful for Sawmill to do this because this helps to debug problems when they are reported. However, full operating system details could be of use to someone attempting to gain unauthorized access to your server, since it would allow them to determine if you are running a vulnerable version of the operating system. This should not be an issue if you keep your operating system up to date, but if you'd rather that this information not be public, you should turn this option off.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a command line that Sawmill will run when it authenticates users. The command line program must accept two parameters: the username and the entered password. The command line must print the names of the profiles that the user is permitted to access, one name per line. A printed value of *ADMIN* means that the user is an administrator, and may access any profile, as well as accessing the administrative interface (any other response, and the administrative interface will not be available). A printed value of *FAILED* means that the username/password authentication failed. If this option is blank, Sawmill will use the users.cfg file (in LogAnalysisInfo) to authenticate users.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating files and folders which do not fall into any other permissions category. This is a UNIX-style chmod value, a 3- or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating files as part of a database. This is a UNIX-style chmod value, a 3- or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating a folder as part of a database. This is a UNIX-style chmod value, a 3or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating a profile file. This is a UNIX-style chmod value, a 3- or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating a folder containing profile files. This is a UNIX-style chmod value, a 3or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating a temporary profile file. This is a UNIX-style chmod value, a 3- or 4digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating the default profile file. This is a UNIX-style chmod value, a 3- or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating the password file. This is a UNIX-style chmod value, a 3- or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating an image file. This is a UNIX-style chmod value, a 3- or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating a folder containing image files. This is a UNIX-style chmod value, a 3or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating the server folder. This is a UNIX-style chmod value, a 3- or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the file permissions to use when creating the LogAnalysisInfo folder. This is a UNIX-style chmod value, a 3- or 4-digit octal number (see File/Folder Permissions).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this is true (checked), the main profile list will show only the profile, if any, whose names start with the value of REMOTE_USER followed by a dash (-). For instance, if REMOTE_USER is "tom," the Main Menu will show profiles named "tom-access," "tom-referrer," or "tom-stats," but will not show "bob-access," "tom access," or "tom." When this is false (unchecked), or if REMOTE_USER is empty (undefined), or if REMOTE_USER is equal to the value of Administrative REMOTE_USER, then all profiles will appear in the Main Menu. REMOTE_USER is a web server CGI variable which contains the username of the user who logged in through an authentication screen; e.g., htaccess or realms authentication. This option provides a simple mechanism for hiding users' profiles from each other, provided Sawmill is run in a section of the site protected by username/password authentication. For instance, you can run Sawmill in CGI mode, protect the Sawmill directory using authentication, turn on this option, and send the CGI URL to your users, so they will be able to log in to Sawmill with web server authentication, and they will only be able to see their own profiles. This option is only useful in CGI mode, and should not be turned on in web server mode (if it is turned on, it will make all profiles invisible), unless you are also running in CGI mode.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the username which, when present in the REMOTE_USER environment variable in CGI mode, marks that user as the administrator. This can be used to easily integrate web server authentication with the authentication used by Sawmill, so Sawmill uses information passed in the REMOTE_USER environment variable to determine which user is logged in. See Show only the profiles matching REMOTE_USER.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether passwords expire after a time. When this option is true, users will be prompted to change their password a fixed number of days (Days until password expiration) after their previous password change. When this option is false, users will never be required to change their password.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the number of days before passwords expire. When Password expires is true, users will be prompted to change their password this many days after their previous password change
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the minimum number of characters in a password (the length of the password, in characters) for it to be accepted as a valid new password.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether passwords can be re-used during a password change. If this option is true, previous password for the user will be checked against the new password, and the password change will only be allowed if none of the past few passwords (Number of previous passwords to check) matches the new one. When this option is false, passwords will not be checked against past passwords, so passwords may be re-used (even the immediately previous password).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the number of previous passwords to check when looking for re-used passwords during a password change. This option is used when Prevent use of previous passwords is true, to determine how many historical passwords to examine.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether new passwords must include a letter (alphabetic character A-Z or a-z). When this option is true, new passwords must contain a letter, and will be rejected during a password change if they do not.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether new passwords must include both an uppercase letter (A-Z) and a lowercase letter (a-z). When this option is true, new passwords must contain both cases, and will be rejected during a password change if they do not.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether new passwords must include a digit (0-9). When this option is true, new passwords must contain a digit, and will be rejected during a password change if they do not.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether new passwords must include a symbol (any non-alphanumerical character, i.e., any character not in the range A-Z, a-z, or 0-9). When this option is true, new passwords must contain a symbol, and will be rejected during a password change if they do not.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a folder on the web server which is running Sawmill as a CGI program. This folder will be used to serve the images which will be embedded in Sawmill's HTML pages. This folder must be accessible to Sawmill as a local folder on the machine running Sawmill, and must also be accessible through a web browser by connecting to the web server running Sawmill. In other words, it must be inside the root folder of the web server running Sawmill. The folder specified by this option must match the URL specified by Temporary folder URL. In other words, the value specified here, which is a pathname local to the machine running Sawmill (see Pathnames), must refer to the same folder which is specified by the URL in Temporary folder URL. It may be specified either as a full pathname, or as a pathname relative to the folder containing Sawmill (e.g. ../ html/sawmill, if your server is UNIX and your site's root directory is called html and is next to cgi-bin). See CGI-mode and The Temporary Folder.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the URL of a folder on the web server which is running Sawmill as a CGI program. This folder will be used to serve the images which will be embedded in Sawmill's HTML pages. This folder must be accessible to Sawmill as a local folder on the machine running Sawmill, and must also be accessible through a web browser by connecting to the web server running Sawmill. Therefore, it must be inside the root folder of the web server running Sawmill. The URL specified by this option must match the folder specified by Temporary folder. In other words, the value specified here, which is specified by URL, must refer to the same folder which is specified by local pathname in Temporary folder. See CGI-mode and The Temporary Folder.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the port Sawmill should listen on when it runs as a web server. See Start web server.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the maximum number of simultaneous tasks (threads of execution) that Sawmill will perform at a time, in web server mode. When a user attempts to use the built-in web server, Sawmill will check if there are already this many threads or connections actively in use. If there are, Sawmill will respond with a "too busy" page. Otherwise, the connection will be allowed. This prevents Sawmill from becoming overloaded if too many people try to use it at the same time, or if one user works it too hard (for instance, by rapidly and repeatedly clicking on a view button in the statistics).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This is the folder containing the Sawmill CGI program, relative to the root of the web server. This should be as it appears in a URL; forward slashes (/) should separate subfolders. It should begin and end with a forward slash (/), unless it is empty (i.e. Sawmill is in the root folder). For instance, if the Sawmill CGI program is inside the "sawmill" folder, which is inside the "scripts" folder of your web server, then this should be /scripts/sawmill/. This is used in the rare cases when Sawmill needs to build a full (non-relative) URL for itself, ex. http://myhost.com/cgi-bin/sawmill Sawmill can automatically compute all parts of the URL except the CGI folder part ("/cgi-bin/" above); this option specifies that part.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the IP address Sawmill should run its web server on. Sawmill uses all available IPs by default, but if you want to have Sawmill's web server bind only to a specific IP, you can set this option. Sawmill uses the IP address you specify here as the IP address the server runs on.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether Sawmill automatically updates the database when the statistics are viewed. When this option is disabled, Sawmill never updates or creates the database unless you manually tell it to. When this option is enabled, it specifies the number of seconds old a database can be before it is automatically updated when the statistics are viewed. For instance, if the value is 3600, Sawmill will automatically update the database when the statistics are viewed if it has not updated the database in the past hour (3600 seconds = 1 hour). If this value is 86400, Sawmill will only update if the database has not been updated in the past day (86400 seconds = 1 day). Regardless of the setting of this option, Sawmill will build the database from scratch when the statistics are viewed if the database has never been built before.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the hostname or IP of the MySQL server used as the back-end database.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the name of the database on the SQL server uses as the back-end database
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a prefix to add to the beginning of every table name in the database. This can be used to share a single database between multiple profiles, by separating them into "namespaces" using a different prefix for each profile.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a suffix to add to the end of every table name in the database. This can be used to share a single database between multiple profiles, by separating them into "namespaces" using a different suffix for each profile.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the socket to use to access MySQL, if a socket is used instead of TCP/IP. If this option is blank, the default socket is used.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the method used to import data in bulk into databases, during database builds and updates. With a Microsoft SQL Server profile, the choices are "ODBC" and "bulk_insert". ODBC will work in all cases, but is slower (sometimes much slower). bulk_insert is usually faster, but requires the use of a temporary directory specified (Load data directory and Load data directory on database server) which must be accessible to both the SQL Server and to Sawmill. One way to do this (the fastest way) is to have both installed on the same server; if that's not possible, then there must be a share mapped to both servers, and the temporary directory must be a UNC path to that share. With an Oracle profile, the choices for are "ODBC" and "sqlldr". ODBC will work in all cases, but is slower; sqlldr is much faster, but requires Sawmill to be running as a user with access to the sqlldr program (which must also be in the executable path) to load data into the database. With a MySQL profile, the choices are "load_data_local_infile" or "load_data_server_infile"; with the first option, data loads are done with a LOAD DATA LOCAL INFILE query, which works whether MYSQL is local or not, but is not permitted in some configurations of MySQL; with the second option, data loads are done with LOAD DATA INFILE, putting temporary files in the location specified by Load data directory, which requires the MySQL server to be running locally, or at least have access to the disk where Sawmill is running. With other databases, this option has no effect.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is used in MySQL and Microsoft SQL Server profiles when the Bulk import method option is set to load_data_server_infile or bulk_insert. It specifies the temporary directory where Sawmill should store intermediate files for upload into MySQL, while building or updating a database. If using MySQL, it must be a directory which is accessible, by the same pathname, to both the SQL server, and Sawmill; if using MS SQL, it must be accessible to Sawmill, and the server must be able to reach it from the location specified by Load data directory on database server.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is used in Microsoft SQL Server profiles when the Bulk import method option is set to bulk_insert. It specifies the temporary directory where the server should read intermediate files for upload into the database, while building or updating a database. Sawmill will write these files to Load data directory, and the server will read them from this location, which must point to the same directory. For instance, this could be a physical disk on the server, mapped to the Sawmill system, referred to in this option by its physical name and referred in Load data directory by its network share path.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the method used to split large SSQL queries across multiple threads, to improve performance. If this option is "auto", SSQL queries are split into as many threads as there are processing cores (usually a good default); so for instance, on a 4-processor system with four cores per processor, large SSQL queries would be split into 16 subqueries, which will run them much faster than a single processing core. If this option is "custom", then the number of threads is specified by the Number of threads for splitting SSQL queries option. If this option is "none", then query splitting does not occur, and all queries are run in the main thread. Splitting queries can significantly speed the performance of long queries on large datasets. SSQL queries are used to generate table reports, compute session information, build cross-reference tables, and more, so this option affects the performance of many operations. Because of the overhead of re-joining the results of each thread, query splitting is only done for large queries; small queries are always run with one thread.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the number of threads used to split large SSQL queries across multiple threads, to improve performance. This option is used when the Method for splitting SSQL queries option is "custom"; otherwise, it has no effect.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies whether cross-reference tables should automatically be kept up to date with the main table of the database. If this is true, it rebuilds or updates all cross-reference tables immediately after building or updating the main table of the database. This extra step takes additional time, so it slows the database build or update. However, if a cross-reference table is not up to date when a report which depends on it is generated, the cross-reference table will have to be built at that time, before the report can be displayed, which can slow down the display of that report significantly. Turn this option on for better report generation performance; turn it off for better database build/update performnace.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option affects the stage of log processing when indices are rebuilt. This option only has an effect if Build indices during log processing is false. If this option is true, Sawmill will scan through the main database table just once during the index rebuilding stage, building all indices simultaneously. If this option is false, Sawmill will build each index separately, scanning through the main table once per index. Turning this option on can greatly speed up index building by combining all the table scans into one, but will use much more memory, since all indices will need to be in memory at the same time. See also Build cross-reference tables and indices simultaneously.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option affects the stage of log processing when cross-reference tables are rebuilt. This option only has an effect if Build cross-reference tables during log processing is false. If this option is true, Sawmill will scan through the main database table just once during the cross-reference rebuilding stage, building all cross-reference tables simultaneously. If this option is false, Sawmill will build each cross-reference table separately, scanning through the main table once per cross-reference table. Turning this option on can greatly speed up cross-reference building by combining all the table scans into one, but will use much more memory, since all cross-reference tables will need to be in memory at the same time. See also Build crossreference tables and indices simultaneously.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option affects the stages of log processing when indices are built. When this option is true, indices are kept in memory during log processing, and are incrementally updated on the fly as new log lines are processed. When this option is false, indices are updated in a single stage after all log data has been processed. Turning this option on can speed database building because it eliminates the need to re-read the main database table after processing log data, but can require much more memory, because all indices must be kept in memory while log data is processed.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option affects the stages of log processing when cross-reference tables indices are rebuilt. This option only has an effect if Build indices during log processing and Build cross-reference tables during log processing are false. If this option is true, Sawmill will combine the index-building and cross-reference table building stages of log processing into one, scanning through the main database table once and building both indices and cross-reference tables. If this option is false, Sawmill will build indices and cross-reference tables separately, scanning through the main table twice. Turning this option on can speed up index and cross-reference table building by combining the two table scans into one, but will use more memory, since both the cross-reference tables and the indices will need to be in memory at the same time.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option affects the stages of log processing when cross-reference tables are built. When this option is true, crossreference tables are kept in memory during log processing, and are incrementally updated on the fly as new log lines are processed. When this option is false, cross-reference tables are updated in a single stage after all log data has been processed. Turning this option on can speed database building because it eliminates the need to re-read the main database table after processing log data, but can require much more memory, because all cross-reference tables must be kept in memory while log data is processed.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option affects multi-processor database builds. When this option is true, each thread (processor) builds the indices for its part of the database separately, and they are merged in a final stage to create the indices for the main database. When this option is false, threads do not build indices; the indices are built in the final stage from the main table (which is merged from the threads' main tables). If your system has fast disk I/O, it is generally best to leave this on, to spend as much time as possible using all processors. But if disk I/O is slow, the I/O contention between processes may slow both threads down to the point that using multiple processors is actually slower than using one.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this option is true, database indices are held completely in memory during database builds. When this option is false, database indices are mapped to files on the disk. Keeping the indices in memory can increase the performance of the index building part of database builds, sometimes by a factor of 3x or more, but requires enough memory to hold the indices (which can exceed 1G in some cases).
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this option is true, database itemnum tables (internal tables which convert from database field values to numbers, and back) are held completely in memory. When this option is false, itemnum tables are kept on disk. Turning this on makes most operations, especially database builds, much faster, but can require huge amounts of memory if huge itemnum tables are required. When this option is turned off, some speed can be recovered by using a large value of Itemnums cache size.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the amount of memory to use for the itemnums cache. This is used, when Keep itemnums in memory is false, to speed up access to the itemnums tables on disk. When Keep itemnums in memory is true, this has no effect.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the amount of memory to use for the cross-reference tables cache. This is used to speed up access to the itemnums tables on disk; the more memory available for the cache, the faster cross-reference table access will be (especially during database builds and updates).
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option affects multi-processor database builds. When this option is true, each thread (processor) builds the crossreference tables for its part of the database separately, and they are merged in a final stage to create the cross-reference tables for the main database. When this option is false, threads do not build cross-reference tables; the cross-reference tables are built in the final stage from the main table (which is merged from the threads' main tables). If your system has fast disk I/ O, it is generally best to leave this on, to spend as much time as possible using all processors. But if disk I/O is slow, the I/O contention between processes may slow both threads down to the point that using multiple processors is actually slower than using one.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the factor by which the database hash table, an internal table used to store information in the database, expands when necessary. A factor of 2 means that the database table will double in size when it needs more space, while 10 means that the database table size will increase by a factor of 10. Setting this to a higher value will eliminate the need for some internal data shuffling, and will speed processing a bit; however, it will also use more memory and disk space.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the initial size of the database hash table, an internal table used to store information in the database. Setting this to a higher value will eliminate the need for some internal data shuffling, and will speed processing a bit; however, it will also use a bit more memory.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the amount of surplus space maintained in the database hash table, an internal table used to store information in the database. Setting this to a higher value will increase database access speed, but will use more memory. This value represents the proportion of space in the table that should remain free; when that space fills up, the table is expanded by Expansion factor for database table. A value of 1 means that at least 10% of the table will always be free, a value of 2 means that at least 20% is free, and so on, up to a value of 9, where at least 90% of the table is kept free at all times. With a value of 1, the same table size will hold 9 times more data than with a value of 9, so the data section of your database (which is often the largest part) will be one-ninth the size with a value of 1 than it would be with a value of 9. However, lower values slow down database building and accessing slightly.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the maximum memory used by the list cache. The list cache is used when tracking unique item lists (e.g. visitors) or database indices, to improve performance when lists get very large. Normally, lists are stored in a form that uses minimal memory, but does not allow items to be added quickly to the list in some situations. When a list appears to be slow, it is moved to the list cache, and expanded into a high-memory-usage, high-performance format. At the end of the operation, it is compacted into the low-memory-usage format again. When the cache is full, the least-used cached lists are compacted. Setting this option higher will use more memory during database cross-reference group building and index building, but will allow more lists to be kept in the fast-access format -- this usually improves performance, sometimes dramatically.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the maximum size of a main table segment that will be merged while merging databases. If a segment is smaller than this, the merge will be done by adding each entry to the existing final segment of the main database table; if there are more than this number of entries, the merge will be done by copying the entire table and indices to the main database, creating a new segment. Copying is faster, but since it creates a new segment it fragments the database, slowing queries slightly. Therefore, setting this to a high value will improve the query performance of the final database, at a cost in log processing performance.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This determines the maximum size of one segment of the main database table. Segments are files stored in the database directory; Sawmill prefers to leave the entire table in a single file, but operating system limitations sometimes make that impossible. So when the table exceeds this size, it is split into multiple files, each smaller than this size. This reduces performance somewhat, but allows arbitrarily large datasets to be represented in a database. If you set this higher than the operating system allows, you will get errors when processing very large datasets (10 million lines of log data corresponds roughly to 1GB of main table, depending on the database structure and other factors).
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the maximum size of a cross-reference table segment which will be merged during a database merge operation; e.g., at the end of a multiprocessor database build. Segments large than this will be copied to the main database, and will form their own segments; segments smaller than this will be merged into the main database. Copies can be much faster than merges, but result in a more segmented main database, making queries slower. Therefore, setting this to a high value will improve the query performance of the final database, at a cost in log processing performance.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This determines the maximum size of one segment of a cross-reference database table. Segments are files stored in the database directory; Sawmill prefers to leave the entire table in a single file, but operating system limitations sometimes make that impossible. So when the table exceeds this size, it is split into multiple files, each smaller than this size. This reduces performance significantly, but allows arbitrarily large datasets to be represented in a database. If you set this higher than the operating system allows, you will get errors when processing very large datasets. Most operating systems can handle files up to 2 GB in size; a setting of 1 GB should be safe in most cases, and should prevent segmentation for all but the largest datasets.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
The specified the maximum memory to be used by a paging caching buffer. Each active table in the database (each table currently in use by the active process) hasa paging caching buffer associated with it, which keeps some parts of that table in memory for faster access. The larger this number is, the more of each table can be kept in memory, and the faster database operations will be; however, the larger this number is, the more memory will be used.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the maximum size of a file, for instance a database table, to be loaded fully into memory. When this option is set higher, more files will be loaded fully into memory, which generally improves performance; however, when this option is set higher, database operations will use more memory.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether newline characters, returns or line feeds, are permitted inside quotes in log data. When this option is true, and a log line starts a quoted section but does not close it Sawmill will continue with the next line, looking for the closing quote there (or on a later line). The resulting "line" of log data will be two or more lines long, and some field values may have returns or linefeeds in them. When this option is false (unchecked), Sawmill will assume that unclosed quotes in a line are errors or formatting problems, and will treat the final (unclosed) quoted section as though there were a closing quote at the end of the line. This option should generally be left off, since turning it makes it possible that a small log data corruption can render the entire rest of the file unprocessed. But if your log data does contain entries with newlines inside quotes (as some CSV data does), then you will need to turn this option on.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option describes the log format, in Apache log format description string style. This is intended for use as a quick way of using a custom Apache format--you can copy the format string from an Apache configuration file (or another file that uses Apache style format strings), and Sawmill will set up the log fields and format regular expressions for you. This option overrides Log data format regular expression.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specified the number of lines to compare using Autodetect regular expression while autodetecting this log format (this option is used in log format plug-ins). For instance, if this is 10, only the first 10 lines will be checked against the regular expression; if it is 100, the first 100 lines will be checked. If any line matches, the format will be considered a match.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This is a regular expression which is used to auto-detect the log format. This option appears in the log format plug-in for a supported log format (plug-ins are in the log_formats folder of LogAnalysisInfo). A log file matches the format if any of the first few lines of the log file (the number of lines is specified by Autodetect lines) match this regular expression. See also Log data format regular expression, which is a similar option serving a different purpose. Log data format regular expression is used during log reading to separate out log fields, and does not affect auto-detection; this option is used only during format auto-detection, and does not affect log reading. See also Autodetect expression.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This is an expression (written in the internal language) which is used to auto-detect the log format. This option appears in the log format plug-in for a supported log format (plug-ins are in the log_formats folder of LogAnalysisInfo). A log file matches the format if any of the first few lines of the log file (the number of lines is specified by Autodetect lines) result in a value of true for this expression (the log line is in volatile.log_data_line). See also Log data format regular expression, which is a similar option serving a different purpose. Log data format regular expression is used during log reading to separate out log fields, and does not affect auto-detection; this option is used only during format auto-detection, and does not affect log reading. See also Autodetect regular expression.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option describes the log format, in Blue Coat custom log format description string style. This is intended for use as a quick way of using a custom Blue Coat format--you can copy the format string from the Blue Coat configuration interface, and Sawmill will set up the log fields and format regular expressions for you. This option overrides Log data format regular expression.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option should be set when the log format is a Common Log Format (CLF), one of a collection of similar log formats which, among other attributes, have the date/time field in brackets, and the user field right before the bracketed date/time field. This option turns on a special work-around which is necessary for certain CLF files where the usernames contain spaces. Because CLF does not quote the username field, spaces in the username field can cause the rest of the fields to be offset, causing strange results. This option causes the field before the date/time field to be combined with any apparently separate fields, until a left-square-bracket ([) is found. This effectively allows the username field to contain spaces in CLF format. This option should be left off for any non-CLF log formats.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the character or string which separates one log field from another in a log entry. For instance, if this is "," then log fields are comma-separated; if it is "==--==", then the fields are separate from each other by ==--==. This option only affects index/subindex style parsing of log data-- it does not affect parsing if Parse log only with log parsing filters is true or if Log data format regular expression is specified.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the expected format of date fields in the log data. Possible formats are:
q
auto: This handles a range of date formats automatically. Four-digit integers are treated as years, and one-or-two digit integers are treated as months or days (months are assumed to preceded days). This can be used in many cases where the exact date format is not known in advance. However, it does not support day-before-month dates. mm/dd/yy; example: 04/21/00. mm/dd/yyyy; example: 04/21/2000 dd/mm/yyyy; example: 21/04/2000 dd/mm/yy; example: 21/04/00. ddmmmyy; example: 21Apr00. dd/mmm/yy; example: 21/Apr/00. dmmmyyyy; example: 21Apr2000, 4Dec1998. dd/mmm/yyyy; example: 21/Apr/2000. mmm/dd/yyyy; example: Apr/21/2000. mmmmm/dd/yyyy; example: April/21/2000, "December 21, 2002", "January 5 2001" (any dividers allowed). yyyy/mmm/dd; example: 2000/Apr/21. m/d/yy; same as mm/dd/yy, but leading zeros in month and day may be omitted; examples: 4/21/00, 12/4/98, 11/23/02. m/d/y; same as mm/dd/yy, but leading zeros in month, day, and year may be omitted; examples: 4/21/0, 12/4/98, 11/23/2. d/m/y; same as dd/mm/yy, but leading zeros in month, day, and year may be omitted; examples: 21/4/0, 4/12/98, 23/11/2. m/d/yyyy; same as mm/dd/yyyy, but leading zeros in month and day may be omitted; examples: 4/21/2000, 12/4/1998, 11/23/2002. d/m/yyyy; same as dd/mm/yyyy, but leading zeros in month and day may be omitted; examples: 21/4/2000,
4/12/1998, 23/11/2002.
q
d/m/yy; same as dd/mm/yy, but leading zeros in month and day may be omitted; examples: 21/4/00, 4/12/98, 23/11/02. mmdd; example: 0421; year is assumed to be 2002. mm/dd; example: 04/21; year is assumed to be 2002. mmm dd; example: Apr 21; year is assumed to be 2002. dd/mmm/yyyy:hh:mm:ss; example: 21/Apr/1998:08:12:45; colon between date and time may be a space instead. mm/dd/yyyy hh:mm:ss; example: 04/21/1998 08:12:45. mmm dd hh:mm:ss yyyy; example: Apr 21 08:12:45 1998. Optionally, a time zone can be specified before the year; i. e., Apr 03 15:57:15 PST 2002. yyyy-mm-dd; example: 1998-04-21. yyyy/mm/dd; example: 1998/04/21. yyyy/m/d; example: 1998/4/21. yyyymmdd; example: 19980421. yyyymmddhhmmss; example: 19980421081245. yymmdd-hhmmss; example: 980421-081245. m/d/yy h:mm; example: 4/21/98 8:12. seconds_since_jan1_1970; the number of seconds since January 1, 1970, possibly with a decimal point; example: 887395356.086578. TAI64N; TAI64N format; example @400000003c675d4000fb2ebc.
The divider between items does not need to be a slash--it can be any single character; for instance if your log format uses 0421-2000 as its date, you can choose mm/dd/yyyy and it will work.
q
auto: Automatic detection of date. Currently this supports only formats where the year is four digits or is two-digits and comes last, and the month is three-letter English or is numerical and comes before the day.
Command line name: Command line shortcut: Command line syntax: Default value:
All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is used if the log date format (Date filter) is one of the few formats which does not include year information. Sawmill will use this option's value as the year. For instance, if the date in the log is "May 7" and this option's value is 2004, then Sawmill will assume that the log entry is for May 7, 2004. The value of this option should be a four-digit integer between 1970 and 2030, or 'thisyear' --if the value of this option is 'thisyear' (without quotes), Sawmill will fill in the current year, the year the log data is processed in, as the year.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the name of the log format of the log data. When this appears in a log format description file, it defines the name of the format being described. When this appears in a profile, it has no effect other than providing the name of the log format plug-in used to create the profile. Sawmill sets this option a new profile is created.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is a regular expression (see Regular Expressions) which is used to extract a "global date" from log data. A global date is a date that appears in log data, usually in the header, and specifies the date for all subsequent log entries. Usually, this is used when the log entries do not contain date information at all, but if they do, this overrides them. When this option is not empty, every line of the log file is checked against this regular expression, and if it matches, the parenthesized section is remembered as the "global date". From then on, or until another global date line is found, the date field of any accepted log entry is replaced by the global date value. If this option is empty, it is not used.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option is a regular expression (see Regular Expressions) which is used to extract a "global date" from the name of the log file. A global date is a date that applies to all logs that appear in a file. Usually, this is used when the log entries do not contain date information at all, but if they do, this overrides them. When this option is not empty, the filename of every log processed is checked against this regular expression, and if it matches, the parenthesized section is remembered as the "global date". From then on, or until another global date filename is found, the date field of any accepted log entry is replaced by the global date value. If this option is empty, it is not used.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether Sawmill ignores format lines in the log data. Format lines are lines starting with #Format, format=, or !! LOG_FORMAT, which appear in the log data, usually in a header, and describe the format of the log data on the following lines. Generally, you want to leave this option off, so Sawmill will understand log format changes if they occur in the middle of the log data. However, if you have defined custom log fields, you need to turn this on, or the field changes will be lost when format lines are encountered.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether quotes (double or single) should be treated specially in log data. If this option is checked (false), quotes are treated the same as any other character. If this option is unchecked, quotes are treated specially; a quoted value containing a field divider will be treated as a single field value-- the quotes will override the field divider, and prevent it from marking the end of the field. See also Treat square brackets as quotes.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether the log format regular expression (Log data format regular expression) option and the index/subindex settings of the log fields have any effect. This option is set in the log format plug-in to determine what type of parsing is used, and it should generally not be changed. When this is false, the (Log data format regular expression) option will be used to parse the log, or index/subindex options will be used if the (Log data format regular expression) option is empty. When this is true, only the log parsing filters (log.parsing_filters in the profile) will be used to parse the log data.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the expected format of time fields in the log data. Possible formats are:
q
auto: This handles most time formats automatically. Times are assumed to be in H:M:S format, where H is the hours, M is the minutes, and S is the seconds; any of them can be one or two digits. The dividers can be any non-numerical values-- they don't have to be colons. If PM appears after the time, it is assumed to be a 12-hour PM time. This can be used in many cases where the exact date format is not known in advance. hh:mm:ss; example: 18:04:23. h:mm:ss; same as hh:mm:ss except that the leading 0 on the hour may be an omitted example: 8:12:45. h:m:s; same as hh:mm:ss except that the leading 0 on the hour, minute, or second may be an omitted example: 8:12:45, 12:8:15, 1:5:9. dd/mmm/yyyy:hh:mm:ss; example: 21/Apr/1998:08:12:45. mmm dd hh:mm:ss yyyy; example: Apr 21 08:12:45 1998. h:mm:ss AM/PM; examples: 9:32:45 AM, 12:34:22 PM. h:mm:ss GMT; examples: 9:32:45 GMT, 18:34:22 GMT. h:mm; example: 18:04, 9:32. hhmm; example: 1804. hhmmss; example: 180423. yyyymmddhhmmss; example: 19980421081245. yymmdd-hhmmss; example: 980421-081245. m/d/yy h:mm; example: 4/21/98 8:12. seconds_since_jan1_1970; the number of seconds since January 1, 1970, possibly with a decimal point; example: 887395356.086578. TAI64N; TAI64N format; example @400000003c675d4000fb2ebc.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether square brackets ([ and ]) should be treated the same as quotes (") when they are encountered in a log entry. For some log formats; e.g., Common Access Log Format, it is convenient to think of square brackets as a special kind of quote; whatever they contain is treated as a single field.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether apostrophes (') should be treated the same as quotes (") when they are encountered in a log entry. Some log formats include literal apostrophes in the data, and do not intend them to be treated as quotes; for these log formats, this option should be set to false.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option controls whether Sawmill complains if the log source is empty when the database is built or rebuilt. If this option is false, Sawmill will generate an error if there is no data in the log source during a (re)build. If this is true, Sawmill will not complain, but will just create a database containing no data. An empty log source is often a sign of an error in the log source, so it is usually best to leave this option off. But in a multi-user environment, some sites may have no log data at all, and in that case, this can be turned on to allow for error-free rebuilds of all databases. Sawmill never generates an error if there is no (new) log data during a database update; this affects only "from scratch" (re)builds.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Command line name: Command line shortcut: Command line syntax: Default value: All Options
- value thisyear
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the number of log entries Sawmill can work on at a time. Increasing this value may improve performance of DNS lookup. However, it will also use more memory.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this option is checked, Sawmill uses the built-in GeoIP database to look up the geographic locations of IP addresses. These will appear in the Countries, Regions, and Cities reports as the names of the countries, regions, and cities where the IP address is physically located.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
- boolean true
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the size in bytes of the blocks which are read from the log data. Sawmill reads the log data in chunks, processing each chunk completely before continuing to the next. Larger settings will reduce the number of disk accesses, potentially speeding processing time, but will also require the specified number of bytes of memory.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the maximum size in bytes of the blocks which are read from the log data. Sawmill initially reads data in chunks specified by Log reading block size, but if it encounters a line longer than that, it will expand its reading buffer to attempt to read the whole line into memory. It will not expand it larger than this value, however; if a line is larger than the value specified here, it will be discarded. This can be set as high as desired, but corrupt log data (e.g., a file of all NULL characters) may cause the buffers to expand to this value, so it should not be set higher than the size of physical memory.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this option is checked, it is possible to view or generate reports during database builds or updates. When a report is requested, log processing will pause, the database will be brought into an internally consistent state (xrefs and indices will be updated with the latest data), the report will be generated, and log processing will resume. When this option is not checked, attempts to view a report during log processing will fail with an error that the database is building, or will show progress of the database build.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
- boolean false
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the method Sawmill uses to spawn or connect to parsing servers. Parsing servers provide log parsing services to the main Sawmill log parsing process; use of multiple parsing servers allows log processing to go faster than it would with a single process. One processor If this option is "one_processor", Sawmill does not use parsing servers, but performs all parsing in the main process, using a single processor. Some processors If this option is "some_processors", Sawmill automatically creates the number of local parsing servers specified by Number of local parsing servers, and terminates them when log processing is complete. All processor If this option is "all_processors", Sawmill automatically creates a local parsing server for each processing core, connects to them during log processing, and terminates them when log processing is complete. Listed servers If this option is "listed_servers", Sawmill uses the parsing server information specified in the log.parsing.distributed.servers node of the profile to spawn (if specified) the parsing servers, and connect to them (listed allows Sawmill to use parsing servers running on other systems, for higher performance). When using "listed", the log.parsing.distributed.servers node in the profile contains one subnode per parsing server; each server subnode contains three subnodes: hostname, port, and spawn. "hostname" is the hostname and port that Sawmill should contact with the SPS protocol (the protocol used to communicate with parsing servers), to farm out parsing; spawn is true if the server should spawn the parsing server locally, or false if the server is already running (probably on another system), so Sawmill should just connect to it.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the starting port for parsing server to bind to, when using the "auto" parsing server distribution method (Parsing server distribution method). The parsing server will attempt to bind to this port, and if it fails, will continue upward in port numbers until it finds one it can bind to.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the number of local parsing servers to use, when processing log data. This option is used when the value of Parsing server distribution method if "some_processors".
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether log data is to be distributed to parsing servers in whole files, or in chunks of data. When this option is false (unchecked), log data is split into lines by the main process, and chunks of lines are distributed to parsing servers, without regard for which file the lines come from; each parsing server is responsible for parsing the lines it receives. When this option is true (checked), each parsing server is sent the pathname of a file, and is responsible for decompressing the file, splitting it into lines, and parsing the lines. If the log data is compressed, turning this option on can improve parsing performance, because the decompression is handled in parallel, by each parsing server process; this is particularly useful for expensive decompression algorithms like bzip2. However, it will only work if the parsing servers have local access to the same files, at the same pathnames, as the main process (if the parsing servers cannot access the files, it will cause an error during database build); and it will only be efficient if there are many files to split between processes. The safest choice is to leave this option off (unchecked).
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies whether Sawmill skips the most recent file in the log source, or processes it. When this option is true (checked), Sawmill will ignore the most recent file (based on modification date of the file) in the local file log source, during database build or update; it will act as though the file was not present, and will not process any lines of the file. When this option is false (unchecked), Sawmill will process all files, including the most recent one. This option is useful in environments with log rotation; the most recent file is the one which is being written, so skipping it ensures that all files processed are complete files, which will have no more data added later. This then allows the Skip processed files on update (by pathname) to be used to do faster skipping during updates, even when the most recent file is growing. Without this option on, Skip processed files on update (by pathname) will cause the latest (growing) file to be added partially to the database, and the rest will never be added; with this option off, Skip processed files on update (by pathname) will skip the growing file, and will wait until the file has stopped growing (when it has been rotated, because then a new file will be growing, and will be the most recent file), and then add the entire file. This option works only with a local file source--it will not work with an FTP log source.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether Sawmill uses the pathname of log files to determine if the files have already been added to the database. If this option is checked (true), then Sawmill will skip over any log files in the log source if it has already added a file with that name to the database. This can speed processing, especially when using FTP, because Sawmill does not have to download or process the file data and use its more sophisticated checking mechanism to see if the data has been processed. However, it will not work properly if you have log files in your log source which are growing from update to update, or if you have log files with the same name which contain different data. If this option is off, Sawmill will handle those situations correctly, but it will have to download and examine the log data of all files to do it.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
If this regular expression is specified (not empty), Sawmill will ignore anything matching it, when computing checksums for skipping. This is useful in cases where the server modifies a line of the log data (e.g., an "end time" line) after the data has been processed by Sawmill. By skipping that line for checksumming purposes, Sawmill will still recognize the data as previously-seen, and will not reprocess it.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the amount of data in each chunk sent to each thread during multiprocessor log processing (data build or update. Larger values reduce the overhead of transmitting large numbers of packages between the main thread and the child threads, and may speed up processing once it gets going; however, larger values also increase the chance that a thread will have to wait until a chunk has been assembled, possibly slowing the processing. The default is a reasonable value.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
THIS OPTION IS DEPRECATED, AND NO LONGER HAS ANY EFFECT. This specifies the number of threads of execution to use to process log data. The threads will execute simultaneously, each processing a portion of the log data, and at the end of processing, their results will be merged into the main database. On systems with multiple processors, using one thread per processor can result in a significant speedup of using a single thread.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this option is checked, Sawmill will convert the log data from the charset specified by Convert log data from charset to the one specified by Convert log data to charset. This is useful if the log data is in a different charset than the one desired for display or report generation.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
If this option is not empty, it will be used together with Convert log data to charset to convert the log data from the log source to a different charset from the one it is currently in. This option specifies the charset the log data is in to begin with; Convert export to charset specifies the charset that the log data will be in after conversion; e.g., the charset that will be seen by log filters and in the database.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
If this option is not empty, it will be used together with Convert log data from charset to convert the log data from the log source to a different charset from the one it is currently in. Convert log data from charset specifies the charset the log data is in to begin with; this option specifies the charset that the log data will be in after conversion; e.g., the charset that will be seen by log filters and in the database. Usually, this should be set to UTF-8.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the delimiter used between fields when generating output with the "process logs" action.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
If this option is false (unchecked), a header line will be generated as the first line of output when using the "process logs" action. The line will contain a list of all internal database field names, separated by the delimiter specified by Output field delimiter. If this option is true (checked), the header line will not be generated.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the format of date/time (timestamp) values in the output of a "process logs" operation. The format is specified as a strftime()-style format string; for instance, '%d/%b/%Y %H:%M:%S' generates a timestamp like '01/Dec/2000 12:34:56', and '%Y-%m-%d %H:%M:%S' generates a timestamp like '2008-12-01 12:34:56'.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the value to output for an empty field value in the output of a "process logs" action. For instance, if this is "(empty)", the string "(empty)" (without the quotes) will be literally included in the output of "process logs"; if this is option is empty, then nothing will be included for the field value in the output (there will be nothing between the preceding and following delimiter).
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the DNS server to use when looking up IP addresses in the log data (when Look up IP numbers using domain nameserver (DNS) is true). This can be either a hostname or an IP address of the DNS server. If this option is empty, and Sawmill is running on a UNIX-type operating system, it will use the system's default primary DNS server. On all other platforms (including Windows), this option must be set when Look up IP numbers using domain nameserver (DNS) is true.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option controls the amount of time Sawmill waits for a response from a DNS (domain nameserver) when attempting to look up an IP number (see Look up IP numbers using domain nameserver (DNS)) during log processing. The value is in seconds; so a value of 30 means that Sawmill will give up after waiting 30 seconds for a response. Setting this to a low value may speed up your log processing, but fewer of your IP numbers will be resolved successfully.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a file where Sawmill should store a database of IP numbers it has looked up in the past (see Look up IP numbers using domain nameserver (DNS)). When Sawmill looks up an IP number, it will look in this cache first, to see if it has already found the hostname for that IP number (or if it has already determined that the hostname cannot be found). If it finds the IP number in the cache stored in this file, it will use that hostname, rather than performing the reverse DNS lookup again. This can greatly improve the speed of converting IP numbers to hostnames, especially when the same log is analyzed again. This option can be either a full pathname of a file, in which case that file will be used, or a single filename, in which case the file will be created inside the LogAnalysisInfo folder.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this is true or checked, Sawmill attempts to look up the full domain name of IPs which appear in the log as IP numbers ("reverse DNS lookup"), using the DNS server specified by the DNS server and Secondary DNS server options. The lookup is performed as the log data is read, so if you change this option, you will need to rebuild the database to see the effects. Looking up the IP numbers provides a more human-readable format for the IP hosts, but requires a network access as frequently as once per line, so it can take much longer than leaving them as IP numbers. There are several ways to improve the performance of DNS lookup. The most important is to make sure Sawmill has a fast network connection to your DNS server; you can usually do this by running Sawmill on your web server (as a CGI program, if necessary), rather than on your desktop system. It may also be faster to configure the logging server to perform the domain name lookups, rather than having Sawmill do it. See also Never look up IP numbers using domain name server and Look up IP numbers before filtering.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this option is true, Sawmill does the DNS lookup (if required by Look up IP numbers using domain nameserver (DNS)) before running the log filters on a log entry. When this option is false, Sawmill does the DNS lookup after running the log filters on a log entry. It is faster to look up DNS after filtering (because filters sometimes reject entries, eliminating the need to look that one up), but if your log filters need to examine the resolved IP address, this option must be on.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the maximum number of IP addresses that will be looked up simultaneously. Setting this to a high value may increase DNS lookup performance, but if you set it too high, you may exceed operating system limitations, and the log processing may fail.
Command line name: Command line shortcut: Command line syntax: Default value: Maximum value Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the URL of a running copy of Sawmill. The URL may be something like http://www.flowerfire.com:8988/if Sawmill is running in web server mode, or it may be http://www.domainname.com/cgi-bin/sawmill if Sawmill is running in CGI mode. The URL is used to embed "live" links in HTML email; for instance, it allows your HTML email to include tables of items which, when clicked, open a web browser and display more information on that item (as they would if the table were in a normal live Sawmill report). If this option is empty, links will not appear in HTML email.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a secondary DNS server to use when looking up IP addresses in the log data (when Look up IP numbers using domain nameserver (DNS) is true). This can be either a hostname or an IP address of the DNS server. If this option is empty, and Sawmill is running on a UNIX-type operating system, it will use the system's default secondary DNS server. On all other platforms (including Windows), this option must be set when Look up IP numbers using domain nameserver (DNS) is true. This is used only if the primary DNS server (DNS server) does not respond.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether Sawmill should use the TCP protocol when communicating with DNS servers. DNS servers more commonly communicate using UDP, and UDP is generally faster, but in some cases it may be preferable to use TCP instead. For instance, if your DNS server is accessible only by TCP due to its configuration or network location.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the value to separate thousands in the displayed number. For instance, if this option is empty, a number may be displayed as 123456789. If the value of this option is a comma (,), the number will be 123,456,789. If it's a period (.), the number will be 123,456,789. If it's a space, the number will be 123 456 789. This can be used to localize number divisions. If this option is empty or "none" the value of lang_stats.number.thousands_divider will be used; i.e., leave this value empty to use the current language's default divider.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the value to separate the integer part from the decimal (fraction) part in displayed number. E.g. this specifies the "decimal point". For instance, if this option is "." (and the thousands divider is a comma), a number may be displayed as 1,234,567.89. If the value of this option is a comma (,) (and the thousands divider is a dot), the number will be 1.234.567,89. This can be used to localize numbers. If this option is empty the value of lang_stats.number.decimal_divider will be used; i.e. leave this value empty to use the current language's default divider. See also Number thousands divider.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the number of seconds which elapse between the progress pages or command-line progress indicators, which appear when the progress display is enabled (see Command-line output types). The "progress" (p) option (see Command-line output types) controls whether a progress indicator will appear during long operations (like reading a large log file). When Sawmill is used from the command line, this option causes it to show a singleline text progress indicator. There isn't enough room on a single 80-character line to show all the information that's shown on a graphical progress page, but Sawmill shows the most important parts: G:[@@@@@@@@@@ ]47% 643779e E00:20:42 R00:20:01 25M/1976k The first character (G in this case) is the first letter of the full description of the current operation, as it would appear in the graphical view. For instance, in this case the G stands for "Getting data by FTP." Other common operations are "(R)eading data" (from a local file or command) and "(E)rasing database." The section in brackets is a progress meter, which gradually fills as the task progresses, and is completely full at the end. The percentage after it is the percentage of to task that is now complete. If Sawmill cannot determine the length of the task (for instance, if it's processing gzipped log data, or bzipped log data, or log data from a command), then it will not show anything in the bar area, and it will show ??% as the percentage. The next section (643779e above) is the number of log entries that Sawmill has processed. The next section (E00:20:42 above) is the time elapsed since processing began, in hours:minutes:seconds format. That is followed by the estimated time remaining, (R00:20:01 above), in the same format. If Sawmill cannot determine the length of the task, the time remaining will be R??:??:??. The last two numbers (25 MB/1976 KB above) are the memory used by the database (25 MB in this case), and the disk space used by the database (1976 KB in this case). Note that this is just the memory used by this database; Sawmill itself will be using additional memory for other purposes, so the total Sawmill memory usage will be higher than this number.
Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether to use base 10 (multiples of 1000) when displaying bytes numbers, or base 2 (multiples of 1024). When this option is on, Sawmill will display all byte values using base-10 calculations (e.g., megabytes); when it is off, it will display all byte values using base-2 calculations (e.g., mebibytes). For instance, 2000 bytes would be displayed as "2.000 k" (two kilobytes) if this option is on, or it would be displayed as "1.953 k" (1.953 kibibytes) if this option is off.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies whether charset conversion should occur, when exporting reports to CSV format. If this option is unchecked, charset conversion will not occur, and the output data will be in the UTF-8 charset. If this option is checked, then Convert export from charset and Convert export to charset options can be used to convert from UTF-8 to whatever charset is required in the output file.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
If this option is not empty, it will be used together with Convert export to charset to convert the result of a CSV export to a different charset. This option specifies the charset the export is in to begin with; Convert export to charset specifies the charset that the export text will be in when it is displayed
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
If this option is not empty, it will be used together with Convert export from charset to convert the result of a CSV export to a different charset. Convert export from charset specifies the charset the export is in to begin with; this option specifies the charset that the export text will be in when it is displayed.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
When this option is checked (true), anyone viewing the statistics for the profile and rebuild or update the database, using the rebuild/update links in the reports. When this option is unchecked (false), only administrators will be able to use those links-the links will not be visible for non-administrative viewers.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether reports are cached on disk. When this option is true, reports are saved on the disk, so if the exact same report is requested again later, it can be quickly generated without requiring database access or report generation. When this option is false, reports are regenerated every time they are viewed. Caching uses additional disk space, so it may be useful to turn this off if disk space is at a premium.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the word used to refer to a single log entry. For web log, for instance, this may be "hit", or for email logs it may be "message". This option is set in the log format plug-in, and generally does not need to be changed unless you are creating a new plug-in. This will appear in various places in statistics pages.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the weekday that is considered the first day of the week. The first weekday will be the first column in calendar months and it will be the first row in weekday tables. Use 1 for Sunday, 2 for Monday, 3 for Tuesday, 4 for Wednesday, 5 for Thursday, 6 for Friday, and 7 for Saturday.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the page that view buttons link to when the associated view is hidden. If this option is empty, the view button itself will also be hidden. Otherwise, this view button will be dimmed, and clicking the button will take you to the URL specified by this option.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the weekday which appears in a different color in calendar months displays. The marked weekday will be displayed in a different color than the other weekdays, i.e. weekday = 1 will display the "S" for Sunday in red color. Use 1 for Sunday, 2 for Monday, 3 for Tuesday, 4 for Wednesday, 5 for Thursday, 6 for Friday, and 7 for Saturday.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the maximum length of a session in the session information. This affects the display of session-based statistics reports like the "sessions overview", and the entry/exit page views. Sessions longer than the value specified will be ignored, and will not appear in the session information. This option is useful because some large ISPs (e.g. AOL) and other large companies use web caches that effectively make all hits from their customers to appear to be coming from one or just a few computers. When many people are using these caches at the same time, this can result in the intermixing of several true sessions in a single apparent session, resulting in incorrect session information. By discarding long sessions, which are probably the result of these caches, this problem is reduced. Also, long visits are often the result of spider visits, which are usually not useful in session reporting. The problem with caches can be eliminated entirely by configuring your web server to track "true" sessions using cookies, and then configuring Sawmill to use the cookie value (rather than the hostname field) as the visitor id. Setting this option to 0 removes any limit on session duration, so all sessions will be included.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the HTML text to appear at the bottom of statistics pages. If both this and Footer file are specified, both will appear, and this will appear second. See also Footer file, Header text, Header file, and Page frame command.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a file containing HTML text to appear at the bottom of statistics pages. If both this and Footer text are specified, both will appear, and this will appear first. See also Footer text, Header text, Header file, and Page frame command.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the HTML text to appear at the top of the statistics pages. If both this and Header file are specified, both will appear, and this will appear first. See also Header file, Footer text, Footer file, and Page frame command.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies a file containing HTML text to appear at the top of statistics pages. If both this and Header text are specified, both will appear, and this will appear second. See also Header text, Footer text, Footer file, and Page frame command.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies a command-line program to run (UNIX and Windows only) to generate an HTML "frame" into which Sawmill's statistics page output should be inserted. This is useful for integrating Sawmill's output with the look and feel of a web site. The program should generate this HTML to its standard output stream. The frame should be a complete HTML document, starting with <HTML> and ending with </HTML>. Somewhere in the document, the text [[[[STATISTICS]]]] should appear. Sawmill will generate statistics pages by replacing that text with the statistics information, and leaving the rest of the page unchanged. See also Footer text, Header text, Footer file, and Header file.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies if table items which start with "http://" should be shown as a link. If this option is enabled all table items which start with "http://" will be shown as a link and will open the page as specified by the table item URL.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the root URL (e.g. http://www.myserver.com/) of the web server which generated the log data. If a server root is specified and "Show page links" is enabled, Sawmill will generate links, where possible, back to the server; these links will appear in the tables in reports. If the server root is not specified, these linked icons will not appear.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls the amount of time a session can be idle before it is considered complete. This affects the display of sessionbased statistics reports like the "sessions overview", and the entry/exit page views. Sessions are considered ended when a user has not contributed an event in the number of seconds specified here. For instance, if this interval is 3600 (one hour), then if a user does not contribute an event for an hour, the previous events are considered to be a single session, and any subsequent events are considered to be a new session.
Command line name: Command line shortcut: Command line syntax: Default value: Minimum value All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the target user agent (web browser) when sending emails. Setting the user agent allows Sawmill to optimally handle line breaking for the target web browser. The user agent can be set to "msie" for Microsoft Internet Explorer, "safari" for Safari, "netscape" for Netscape and Mozilla and "unknown" if the user agent (web browser) is not known. Setting the user agent to "unknown" will break lines by spaces and by inserting a tag; setting it to a known user agent will break lines by spaces, characters and tags as supported in the specified web browser.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the target user agent (web browser) when generating report files. Setting the user agent allows Sawmill to optimally handle line breaking for the target web browser. The user agent can be set to "msie" for Microsoft Internet Explorer, "safari" for Safari, "netscape" for Netscape and Mozilla and "unknown" if the user agent (web browser) is not known. Setting the user agent to "unknown" will break lines by spaces and by inserting a tag; setting it to a known user agent will break lines by spaces, characters and tags as supported in the specified web browser.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This specifies the delimiter to use between fields in a CSV export. CSV is comma-separated-values, so typically this would be a comma, but this can be set to any other character to make the "CSV" output actually separated by tabs (use \t), pipes, or any other string.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This controls whether the numbers from the Overview should be used to compute the Total rows of table reports. When this option is true, a specialized Overview report is generated, to generate the Totals row of each table report. When this option is false, the Total row is computed by summing each column of the table. Totals rows derived from an Overview have the advantage of showing true unique numbers for columns which compute unique values (like "visitors" in web logs). However, the calculation involved can be much slower than a simple sum, especially if the report element omits parenthesized items (which is default for most reports). The performance cost can be as little as 2x slowdown, but in extreme cases involving very complex fields, it can be much more, hundreds of times slower. Therefore, this option should be set to false unless the unique visitors total is really needed.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the maximum number of characters per table item per line. Sawmill inserts a break entity if the number of characters of a table item is greater than the maximum continuous text length.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the minimum number of characters to break the last table item line. If break entities are inserted into a long table item line then Sawmill checks the number of characters of the last line for that table item, if the number of characters are less than the specified offset then the line will not break, so the offset avoids that very few characters break into a new line. The recommended offset is 4 - 12 characters.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the maximum number of characters per table item. Characters exceeding the maximum text length will be truncated.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the maximum number of characters per line in a session path and path through a page report. Sawmill inserts a break entity if the number of characters in a line is greater than the maximum continuous text length.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the minimum number of characters to break the last line in a session path and path through a page report. If break entities are inserted into a long line then Sawmill checks the number of characters of the last line, if the number of characters are less than the specified offset then the line will not break, so the offset avoids that very few characters break into a new line. The recommended offset is 4 - 12 characters.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Long description
This option specifies the maximum number of characters of page names in the session path and path through a page report. Characters exceeding the maximum session path text length will be truncated.
Command line name: Command line shortcut: Command line syntax: Default value: All Options
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
December 15, 2006
You're receiving this newsletter because during the downloading of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill. net . News We're reviving the long-dormant Sawmill Newsletter! Each issue will have a "tips and techniques" article, describing how to do something fancy with Sawmill. This issue describes the use of CFG files to include external metadata in your Sawmill reports, which lets you pull in data from other databases and integrate it with Sawmill. We are currently shipping Sawmill 7.2.8. You can get it from http://sawmill.net/download.html .
Tips & Techniques: Using .cfg Maps to Embed External Metadata in Sawmill Reports Sawmill is a log analyzer, and the default reports it generates are based entirely on data contained in the log data. But in many cases, the information in the log data is a potential "key" to additional information, stored in a database or elsewhere. For instance, a log entry in a web log might contain a URL parameter which contains an order ID, which is the key to a database record (in an external database) with information about the order, including items purchased, purchase price, and more. Further, the order record might contain a customer ID, which is the key to another table, possibly in another database, with information about the customer, including name, organization, etc. This external metadata (external because it is not in the log file) can be used by Sawmill, and included in the reports Sawmill generates, if you export it to CFG format, and refer to it from log filters in your profile. This article describes how, and gives a concrete example. For the example, we will assume that the profile analyzes Apache web logs, and that the "page" field of the log contains a URL like this one for completed orders: /store/thanks_for_buying.html?order_id=12345 We assume that there is a database with an "orders" table which contains columns order_id and customer_name fields. Step 1: Create Custom Fields Sawmill doesn't know about the external fields when you create the profile, so it doesn't create the necessary components in the profile to track them. Your first step is to create these components. There are six parts to this; the first three are essential: 1. A log field, to manipulate the value 2. A log filter, to set the value of the log field, using the external metadata source 3. A database field, to store the value in the database The next three are optional: 4. A report, to display the value in a table 5. A report menu item, to create a link at the left of the reports to view the report 6. A cross-reference table
If you don't create a report, you can still filter on the values in this field, using the Filters window of the Reports interface. If you don't create a report menu item, you can still access the report from the Scheduler or the command line, and use filters on the database field. If you don't create a cross-reference table, the report will be much slower than if you do. For details on these steps, see: http://www.sawmill.net/cgi-bin/sawmill7/docs/sawmill.cgi?dp+docs.faq.entry+webvars.entry+custom_fields For this example, we create three custom fields, called customer_name, item, and cost. We will create a single log filter to set all three fields below in Step 3: Create the Log Filter. But first, we will create the CFG file, which Sawmill will use to look up order information. Step 2: Create the .cfg File To give Sawmill fast access to the data in the "orders" table, we need to create a file in Sawmill's CFG format. This can be done manually with a text editor, or by writing a script or program to query the database and generate the file. In this example, we will call the file "orders.cfg". The contents of the file is:
orders = { 12345 = { customer_name = "John Jones" item = "Mouse" cost = "15.00" } 12346 = { customer_name = "Sue Smith" item = "Monitor" cost = "129.00" } }
For example, order number 12345 was an order by John Smith for a $15.00 mouse, and order number 12346 was an order by Sue Smith for a $129.00 monitor. In real-world data, there could be much more information here, including the exact model and manufacturer of the monitor and mouse, credit card information, tax information, etc. But for this example, we'll only be using these three fields. The first line of the file must match the name of the file (minus the .cfg extension). Step 3: Create the Log Filter Now that we have created orders.cfg, we have implicitly created a configuration node called "orders", which can be accessed from Salang (the Sawmill Language, the language of log filters) as "orders". Therefore, we can now create a log filter like this:
if (matches_regular_expression(page, '^/store/thanks_for_buying.html[?]order_id=([0-9]+)')) then ( customer_name = node_value(subnode_by_name(subnode_by_name('orders', $1), 'customer_name')); item = node_value(subnode_by_name(subnode_by_name('orders', $1), 'item')); );
This expression checks for a page field with the correct format (using a regular expression); if it matches, the order ID will be in the variable $1. Then, it uses subnode_by_name() to look up the subnode of 'orders' which matches $1 (the order record for the order_id from the URL), and then uses subnode_by_name() again to get the customer_name. It then repeats the process for the item. You can create this filter by clicking Config, then Log Data -> Log Filters, then New Log Filter at the upper right, then the Filter tab, then choosing "Advanced expression syntax" as the filter type, entering a name for it in the Name field, and entering the filter expression in the main field:
Step 4: Build the Database, and View the Reports Now rebuild the database, and view the reports. There should be a "Customer Names" report, and "Items" report, showing the number of hits (orders) for each customer, and the number of hits (orders) for each item. You can now use these reports like any other report; e.g., you can click on a particular Item, then zoom to Countries/Regions/Cities, to see which countries of the world purchased that item. Advanced Topic: Add a Numerical Field To take it a step further, let's add the "cost" field too. This is a numerical field, rather than a "non-aggregating" field like customer_name and item. It is most useful as a "summing" field, which can appear as a column in any table, and sums the values for each row. To create a custom numerical field, create a log field as before, but use this as the database field:
cost = { label = "cost" type = "float" log_field = "cost" display_format_type = "two_digit_fixed" suppress_top = "0" suppress_bottom = "2" } # cost
Setting the type to "float" specifies that this is a floating-point aggregating field, capable of holding and aggregating floating point values (including fractional values, like cents). Then change the log filter to include a new "cost" line:
if (matches_regular_expression(page, '^/store/thanks_for_buying.html[?]order_id=([0-9]+)')) then ( customer_name = node_value(subnode_by_name(subnode_by_name('orders', $1), 'customer_name')); item = node_value(subnode_by_name(subnode_by_name('orders', $1), 'item')); cost = node_value(subnode_by_name(subnode_by_name('cost', $1), 'cost')); );
This extracts cost exactly the way the other two lines in the filter extracted customer_name and item. Now, rebuild the database, and go to Config->Manage Reports->Reports/Reports Menu. Edit the Items report you create, and edit its only report element, and in the Columns tab, add Cost as a column. Then View Reports, and click the Items report, and you'll see the total dollar value (sum) of the sales for each item, in the Cost column. The Cost column can be similarly added to any other table or table-with-subtable report. For instance, if you add it to the Years/Months/Days report, you'll be able to see sales by year, month, or day. For best performance of reports including a "cost" column, you can add "cost" to the cross-reference table for that report (the one containing the columns of that report), by editing cross_reference_groups in the profile .cfg file, and adding a "cost" line to each group.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
January 15, 2007
You're receiving this newsletter because during the downloading of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill. net . News This issue of the Sawmill Newsletter discusses method of filtering spider traffic out of web log reports. We are currently shipping Sawmill 7.2.8. You can get it from http://sawmill.net/download.html .
Tips & Techniques: Ignoring Spider Traffic Spiders (also called robots) are computer programs on the Internet which access web sites as though they were web browsers, automatically reading the contents of each page they encounter, extracting all links from that page, and then following those links to find new pages. Each spider has a specific purpose, which determines what they do with the page: some spiders collect page contents for search engines; other spiders search the web to collect images or sound clips, etc. Depending on your purpose for analyzing web logs, you may with to include the spider hits in your reports, or exclude them. You would include them: * If you want to show technical metrics of the server, like total hits or total bandwidth transferred, to determine server load or * If you want to determine which spiders are hitting the site, and how often, and what pages they are hitting, to determine search engine coverage. You would exclude them: * If you want to show only human traffic, to get a better idea of the number of humans viewing the site. By default, Sawmill includes spider traffic in all reports. But if you don't want it, you can exclude it using either a log filter, or a report filter. You would use a log filter if you never want to see spider traffic in reports; you would use a report filter if you sometimes want to see spider traffic, and sometimes not. Rejecting Spider Traffic Using a Log Filter Log filters affect the log data as it is processed by Sawmill. In this section, we will show how to create a filter which rejects each spider hit as it is processed, so it is not added to the Sawmill database at all, so it does not ever appear in any report. 1. Go to the Admin page of the Sawmill web interface (the front page; click Admin in the upper right if you're not already there). 2. Click View Config in the profiles list, next to the profile for which you want to reject spiders. 3. Click the Log Filters link, in the Log Data group of the left menu. 4. Click "New Log Filter" in the upper right of the Log Filters list.
5. Enter "Reject Spiders" in the Name field. 6. Click the Filter tab. 7. Click New Condition; choose Spider as the Log field, "is NOT equal" as the Operator, and enter "(not a spider)" as the Value, and click OK:
8. Click New Action; choose "Reject log entry" as the Action, and click OK:
10. Now, rebuild the database, and view reports, and the spiders will be gone from the reports. If you wanted to show only spiders (ignoring human visitors), you could use "is equal" in step 7, above. Rejecting Spider Traffic Using a Report Filter Report filters affect the reports, by excluding some events in the database from affecting the reports. They slow down the reports somewhat, so if you're sure you'll never want to see spider traffic, a Log Filter is a better option (see above). But if you want to be able to turn spider traffic on and off without rebuilding the database, adding a Report Filter is the best choice. Here's how: 1. Click the Filters icon at the top of the report page. 2. Click "Add New Filter Item" in the Spider section of the Filters page. 3. Enter "(not a spider)" in the field:
4. Click OK. 5. Click Save And Close. The report will immediately display again, this time without spider traffic. Note that by clicking the "is not" button in the Spider section of the Filters page, above, you can show only spider traffic, instead of showing only non-spider traffic. Defining Your Own Spiders The file spiders.cfg, in the LogAnalysisInfo folder, contains definitions of all spiders known to Sawmill. During log processing, Sawmill compares the User-Agent field of each hit to the substring values of the records in this file; if the User-Agent field contains the substring of a spider listed in spiders.cfg, the "label" value of that record is used as the name of the spider in the reports. You can add your own records to spiders.cfg, if you know of spiders which are not defined there. For instance, adding this to spiders.cfg:
somespider = { label = "Some Spider" substring = "SomeSpider" } (on the line after the "spiders = {" line) will cause any hit where the User-Agent contains "SomeSpider" to be counted as a hit from a spider called "Some Spider"; "Some Spider" will appear in the Spiders report. For better performance, many of the records in spiders.cfg are commented out, with a leading # character. Removing this character will uncomment those spiders, to allow Sawmill to recognize them (but will also slow log processing). Advanced Techniques The method above will identify well-behaved spiders, but it will not work for spiders which do not announce themselves as
spiders using their User-Agent header. It is difficult to identify these spiders, but there are several advanced methods which get close. One option is to look for hits on the file /robots.txt (a file hit by most spiders), and count all future hits from those IP addresses as spider hits. If the spider doesn't even hits /robots.txt, another possible approach is to look for IPs which do not hit CSS or JS files, which indicates that they are farming HTML data, but not rendering it, a strong indication that they are spiders. These topics are discussed in the Using Log Filters chapter of the Sawmill documentation.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
February 15, 2007
You're receiving this newsletter because during the downloading of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill. net . News This issue of the Sawmill Newsletter discusses creating custom fields. We are currently shipping Sawmill 7.2.9. You can get it from http://sawmill.net/download.html .
SECURITY ALERT Sawmill 7.2.8 and earlier includes a potential security vulnerability which can allow users to gain administrative access to Sawmill, if they have login permissions (e.g., SSH) on a system where the Sawmill web browser is running, and if they have read access to the cookies of the browser. Sawmill 7.2.9 reduces this vulnerability by using sessions and session cookies, so even if the authentication information is acquired by a malicious user, it cannot be used beyond the end of the current session. This vulnerability is a "moderate" severity; it is not remotely exploitable. For best security, we recommend that all existing Sawmill users upgrade to Sawmill 7.2.9 (a free upgrade for Sawmill 7 users).
Tips & Techniques: Creating and Populating Custom Fields A newly-created profile in Sawmill typically contains one log field for each field in the log data and one database field for each log field, plus various derived database fields (like file_type, which is derived from the filename). It is often useful to have additional fields which are not in the log data, but which can be computed from the log data. For instance, it might be useful to have a Full Name field which is computed from the username or IP, or a Department field which is computed from the IP. Sawmill lets you create any number of custom fields, populate them, filter them, or generate reports from them. Once created, custom fields work just like the other fields in the database. Step 1: Create the Log Field The first step is to create a log field. This essentially fools Sawmill into thinking the value is in the log data. Most log fields are populated from the log data, but if you define your own, you can populate it yourself using a Log Filter, and then use it as a Database Field in reports. To create the log field, edit the profile .cfg file. This is a text file which is in the LogAnalysisInfo/profiles folder of your Sawmill installation folder. You can edit it using a text editor, like Notepad on Windows, or vi or emacs on other systems, or TextEdit on Mac. Find the "log = {" section, and within that the "fields = {" section, and find an existing field, like the spiders field (we'll use a web log for this example). Here's what the spiders field might look like:
spiders = { index = "0" subindex = "0" type = "flat" label = "$lang_stats.field_labels.spiders" } # spiders Now copy this section, leave the original(!), and add a new one like this:
full_name = { index = "0" subindex = "0" type = "flat" label = "Full Name" } # full_name This defines a new log field full_name, whose label (how it appears in the web interface) is "Full Name". Step 2: Create the Log Filter Next we need to give this field a value. For now, let's assume there is a field called first_name and another field called last_name, and all we need to do is concatenate them to get the full name. In Config->Log Data->Log Filters, create a new log filter, and choose Advanced Expression as the type. Then enter this for the log filter:
full_name = first_name . " " . last_name; Now, the full_name value for each entry will be the first_name value, followed by a space, followed by the last_name value. Step 3: Create the Database Field In order to see reports based on this field, we need to create a database field for it. This is similar to creating a log field. Find the "database = {" section in the profile .cfg file, then find the "field = {" section within in, and then find an existing database field, and duplicate it. Again we'll start with the existing spiders field:
spider = { type = "string" label = "$lang_stats.field_labels.spider" log_field = "spider" suppress_top = "0" suppress_bottom = "2" } # spider And modify it to make our custom full_name field:
full_name = { type = "string" label = "Full Name" log_field = "full_name" suppress_top = "0" suppress_bottom = "2" } # full_name
Step 4: Create a Report Now that we've created the log field and the database field, we can finish the work in the web interface. The next step is to create a report, and a report menu. Do that by going to Config -> Manage Reports -> Reports/Reports Menu, and clicking New Report. Enter Full Name as the report name and as the menu name: Now, click the Report Elements tab, and click New Report Element:
Enter Full Name as the report element name, and choose Standard Table as the Report Element Type:
Click the Fields tab and select Full Name as the main field:
Now click Okay, Save and Close. Step 4: Rebuild the Database Now rebuild the database (click Rebuild Database in the Config page), to reprocess all log entries to include the Full Name field. View reports, and you will see a Full Names report at the bottom of the reports list, showing all full names for the dataset. Advanced Topic: Cross-reference Groups Though it's not necessary to add a cross-reference group, it will make the Full Names report much faster. To add a crossreference group, edit the profile .cfg file, and look for "cross_reference_groups = {". Find the spiders section below it (again, we will base our modification on the spiders field):
spider = { date_time = "" spider = "" hits = "" page_views = "" spiders = "" worms = "" errors = "" broken_links = "" screen_info_hits = "" visitors = "" size = "" } # spider and copy it, don't delete the spiders one, to create a new cross-reference group for full_name:
full_name = { date_time = "" full_name = "" hits = "" page_views = "" spiders = "" worms = "" errors = "" broken_links = "" screen_info_hits = "" visitors = "" size = "" } # full_name The list of numerical fields will depend on the format, so make sure you base it on a group in your profile, rather than using the example above. Now rebuild the database, and the Full Names report will be generated from the cross-reference table, rather than from the main table, which is much faster. Advanced Topic: Using CFG Maps to Populate the Log Field In this example, we assumed that all information needed to compute full_name was in the existing log fields (first_name and last_name). This is usually not the case, and the field value usually needs to be pulled in from another source. The most common source is a CFG file. For instance, we might create a CFG file which maps usernames or IP addresses to full names, and looks up the full name from that file. See the December 15 issue of the Sawmill Newsletter, "Using CFG Maps", for details on populating log fields using CFG maps.
[Article revision v1.1]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
March 15, 2007
You're receiving this newsletter because you purchased Sawmill ; or, during the downloading of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill.net . News This issue of the Sawmill Newsletter discusses sending reports by email using Sawmill, when the local SMTP server requires authentication. We are currently shipping Sawmill 7.2.9. You can get it from http://sawmill.net/download.html .
Tips & Techniques: Emailing Reports From Environments with SMTP Authentication Sawmill emails reports using unauthenticated SMTP; it does not provide a username or password when communicating with the SMTP server (i.e., it does not use SMTP AUTH). In environments where the primary SMTP server requires authentication, this can cause an error when attempting to email a report, because the SMTP server will not accept the mail for delivery, because Sawmill has not authenticated. There are several possible solutions to this: 1. Reconfigure the SMTP server. 2. Use an SMTP proxy or forwarding script. 3. Use the MX address of the recipient as the SMTP server. These options are discussed in detail below: 1. Reconfigure the SMTP Server One option is to configure the SMTP server to allow Sawmill to access it without authentication. This could be as simple as allowing anyone to access it without authentication, which might be a reasonable solution if the SMTP server is on an internal network. However, a completely open SMTP server, even behind a firewall, could be used by a spammer (perhaps using a compromised system), and is not the most secure choice. A more secure choice is to configure the SMTP server to allow only Sawmill to access it without authentication, by adding a rule to the SMTP server specifying that the IP address of the system where Sawmill is running may send email without authentication. This still opens up a small potential vulnerability, since the IP address could be spoofed, or the Sawmill system itself could be compromised, but it is more secure than opening unauthenticated SMTP access to the entire internal network. 2. Use an SMTP Proxy or Forwarding Script Another option is to run an SMTP proxy or script which does not require authentication, but which uses SMTP authentication when forwarding the mail to the SMTP server. For instance, you could run sendmail on a local system, and all messages sent to a particular email address on that system would automatically be forwarded to the "real" SMTP server, but with a specific username and password provided (i.e., with SMTP AUTH added). Sawmill could then be configured to send to the proxy,
without authentication, by providing the proxy's address as the SMTP server in Sawmill; the proxy would add authentication when passing the message on to the main SMTP server; and the message would be delivered. This is a good option when the SMTP server cannot be reconfigured; it allows the SMTP server to remain configured securely, to require SMTP AUTH in all cases, while still allowing Sawmill to send through it without needing to include SMTP AUTH information in its original message. 3. Use the MX Address of the Recipient as the SMTP Server A third option, and often the easiest one, is to use the MX address of the recipient as the SMTP server, instead of using the usual internal SMTP server. This works because every domain has an MX record in its DNS record, and every MX record points to an SMTP server which does not require authentication when delivering email to its own domain. So by looking at the DNS record of the recipient's domain, you can find an SMTP server which will allow Sawmill to talk unauthenticated SMTP directly to it, to deliver mail to the recipient. For example, suppose you wanted to email a report to support@sawmill.net . The domain is sawmill.net, so we can get the MX record by running dig:
% dig sawmill.net mx ; <<>> DiG 9.2.2 <<>> sawmill.net mx ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61374 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3 ;; QUESTION SECTION: ;sawmill.net. ;; ANSWER SECTION: sawmill.net. ;; AUTHORITY SECTION: sawmill.net. sawmill.net. ;; ADDITIONAL SECTION: mail.sawmill.net. dns.flowerfire.com. dns2.flowerfire.com. ;; ;; ;; ;;
IN
MX
3600
IN
MX
10 mail.sawmill.net.
3600 3600
IN IN
NS NS
dns.flowerfire.com. dns2.flowerfire.com.
IN IN IN
A A A
Query time: 7 msec SERVER: 10.0.1.1#53(10.0.1.1) WHEN: Mon Mar 5 13:57:40 2007 MSG SIZE rcvd: 149
The MX record in this case is mail.sawmill.net (above, in bold). Therefore, you can use mail.sawmill.net as the SMTP server in Sawmill without authentication. For instance in the SMTP Server field of the Scheduler, when emailing a report, together with the recipient support@sawmill.net, and it will accept the SMTP connection from Sawmill, and deliver the report message to support@sawmill.net. MX records can also be looked up at http://www.mxtoolbox.com/, and similar web sites.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
April 15, 2007
You're receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill.net . News This issue of the Sawmill Newsletter explores using Sawmill to report the click-through ratio (CTR) in standard Sawmill reports using a relatively new feature in Sawmill, calculated report columns. The CTR is used industry-wide to calculate the relative effectiveness of an advertising campaign, including pay-per-click campaigns. In the Tips & Techniques section below, you'll find a detailed description of how to add a CTR column to a Sawmill report. We are currently shipping Sawmill 7.2.9. You can get it from http://sawmill.net/download.html . Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Adding Calculated Columns to a Report Important: Calculated Columns are available only in Sawmill 7.2.6 and later. Suppose we have a report like this:
This report shows, for each advertisement: 1. The number of impressions for the advertisement (the number of times the ad was displayed). This can be calculated using a Log Filter, from the number of hits on an ad image, assuming you're serving the image yourself. 2. The number of clicks on the advertisement. This can be calculated using a Log Filter, by counting the number of hits on the target page of the ad. 3. The cost of the ad. This can be calculated using a Log Filter, looking up the cost per impression, or per click, in a CFG map. 4. The number of unique users seeing the ad. All this can be done with Sawmill, using custom numerical fields and CFG maps. We previously described custom fields in the February 15, 2007 newsletter, and CFG maps in the December 15, 2006 newsletter. But now, suppose we want to know the click-through ratio (CTR) for each campaign. We could do this by exporting this table to a spreadsheet, and adding an extra column with an equation like: 100 * Clicks / Impressions. But that would require a separate step, so we'll add this column, and do this calculation, in Sawmill. What we'd really like is to have this column in the Sawmill report, like this:
The Click-through Ratio (CTR) can be computed by dividing the number of clicks by the number of impressions, and multiplying by 100. However, this cannot be done with a custom database field. Custom database fields are useful when the value of the field can be computed from other fields in the same log entry. However, the click-through ratio must be computed from the total number of clicks and the total number of impressions, which are not available until the entire log has been analyzed and aggregated, so the CTR cannot be computed using log filters, which are executed during log processing. For this type of calculation, Sawmill provides a Calculated Column feature, where a column can be calculated at report generation time, from other columns and rows of the table it is in. This is analogous to the way a spreadsheet program can compute a cell from other cells in the spreadsheet. In Sawmill, this is done by adding a new database field, with an "expression" parameter that specifies a Sawmill Language (Salang) expression, which calculates the value of the cell; then that database field is added to the report table. Creating a New Database Field with an Expression First, using a text editor, edit the profile CFG file, which is in LogAnalysisInfo/profiles. Search for "database = {", then search for "fields = {", to find the database fields group. Then, add this field, as the last field in the group:
ctr = { label = "CTR" type = "string" log_field = "ctr" display_format_type = "string" expression = `((1.0 * cell_by_name(row_number, 'Clicks')) / cell_by_name (row_number, 'Impressions')) * 100.0` } # ctr This creates a field whose value is computed from the table that contains it. This is done by adding an "expression" parameter whose value is a Salang expression: ((1.0 * cell_by_name(row_number, 'Clicks')) / cell_by_name(row_number, 'Impressions')) * 100.0 This expression is somewhat complex, so let's break it down a bit: row_number : This is a special variable which means "the current row number." So when computing a cell value in row 5, this will be 5. cell_by_name(row_number, 'Clicks') : This gets the value of the Clicks field, in the row row_number (i.e., in the current row). cell_by_name(row_number, 'Impressions') : This gets the value of the Impressions field, in the row row_number (i.e., in the current row). The use of 1.0 and 100.0 force the number to be a floating point number, so it includes the fractional part (otherwise, it will be an integer). These functions are documented in detail in The Configuration Language, in the Technical Manual. Adding the Column to the Report Now that we've defined the field, we need to add it to the report. To do this, search from the top of the CFG file for "statistics = {", then for "reports = {", then find the report element you want to add the column to. Within that report element, add a new CTR column to the "columns" group:
ctr = { data_type = "unique" display_format_type = "%0.2f%%" field_name = "ctr" header_label = "CTR" show_bar_column = "false" show_graph = "true" show_number_column = "true" show_percent_column = "false" type = "number" visible = "true" } # ctr
Generating the Report Finally, rebuild the database, and the Campaigns report will now show the CTR column. Questions or suggestions? Contact support@sawmill.net . If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net .
[Article revision v2.2] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
May 15, 2007
You're receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill.net . News We are currently shipping Sawmill 7.2.9. You can get it from http://sawmill.net/download.html . This issue of the Sawmill Newsletter explores using Log Filters to send alerts with Sawmill. Sawmill can be used to monitor a stream of log data, in real time, and send alert emails (or perform other actions) based on the content of the log data. This has multiple uses, including looking for security issues, like port scans. This edition of Tips & Techniques describes implementing an alert to inform you immediately when a web site visitor looks at a particular page of your site. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Sending Email Alerts Based on Real-Time Log Data Scanning Suppose we have a web site, with a particularly important page, and we want to know immediately when someone looks at it. In this example, we'll assume that we want to inform our government sales agent any time someone logs in to the site and looks at the page /sales/government.html . We'd like to email govsales@mydomain.com every time this page is accessed, with the username and IP of the logged-in user who accessed the page. For example, we want to send an alert when a particular condition is met. This can be done in Sawmill using Log Filters. Almost any condition can be defined in a Log Filter, and the Log Filter can be used to send email using the send_email() function, when the condition is met. In this case, we'll assume we're analyzing Apache data on a Linux system, so the condition we want is: page eq "/sales/government.html" For IIS, replace page with cs_uri_stem.
The full Log Filter which you would enter as an advanced expression in a new Log Filter, in Config -> Log Processing -> Log Filters would then be:
if (page eq "/sales/government.html") then ( send_email("govsales@mydomain.com", "govsales@mydomain.com", "Subject: Government Sales access detected from " . authenticated_user . "\r \n" . "To: govsales@mydomain.com\r\n" . "\r\n" . "Sawmill has detected an access to /sales/government.html;\r\n" . "the username is " . authenticated_user . ", and the hostname is " . hostname . ".\r\n", "smtp.mydomain.com"); ); The parameters to send_email() are: 1. govsales@mydomain.com: the sender address. 2. govsales@mydomain.com: the recipient address. Use commas between multiple addresses. 3. The message body. This is in SMTP format. For example, it starts with SMTP headers , Subject and To should probably be present, Date will be added automatically, with each followed by \r\n. Then there is another \r\n and then the body of the message. 4. smtp.mydomain.com: the SMTP server. This server must accept unauthenticated SMTP delivery for the recipient(s). When you rebuild or update the database, Sawmill will send an email for each occurrence of /sales/government.html in the data it processes. Sending Alerts in Real Time Database builds or updates are typically done periodically, which introduces a delay between the time the data is logged, and the time the alert is sent. Furthermore, a database isn't needed at all for an alert; Sawmill doesn't need to build a database to parse logs and run log filters. For true real-time alerts, you should not build or update a database--you should use the "process_logs" command-line action, with a command-line log source that streams data as it is logged. This means that you would have a profile dedicated to alerts; if you also want to do reporting, you would do it in a separate profile. The first step is to create a command-line log source to stream the data. The best approach depends on the environment, but it needs to be a script or program or command which, when run, immediately prints a line of log data each time a new line is logged. In a simple Apache/Linux environment, with the log data written continually to /var/log/httpd/access-log, no script is necessary; you can just use the built-in tail command as the command line log source: tail -f /var/log/httpd/access-log The tail option with the -f flag will watch the file (/var/log/httpd/access-log), and will print each new line that appears at the end of it. This command never completes; it keeps watching the file forever, which is exactly what we want for a real-time alerting log source. tail -f is available natively on all platforms except Windows, and is available on Windows with Cygwin. Now that we have set up the log source in the profile, we can run it with this command: nohup ./sawmill -p profilename -a pl & The -a pl option is the process_logs action, and tells Sawmill to process all the log data in the log source, and run the log filters against it. It does not build a database, so it uses no disk space; and with a command-line log source which never completes (like the one we created above), it will never complete either. It will just run the new log data against the filters forever, processing each line as it comes in, and sending alerts as specified in the Log Filters. Thus, it is a real-time alerting system. Because this never completes, it is best to background it, which is why we're using nohup in front, and & in back. On Windows, this could be run like this:
SawmillCL -p profilename -a pl and the window would need to be kept open. Or, it could be run from the Windows Scheduler, which will cause it run as a background process. Reducing buffering with "Log reading block size" By default, Sawmill buffers incoming log data in blocks of 100KB, which means that 100KB of log data must be generated before Sawmill will start running the filters against it. For very low-volume log sources, this can substantially delay alerts; if it takes 60 seconds to generate 100KB of log data, alerts might occur as much as 60 seconds after the log data is generated. To get faster alert times, in this case, you can set the "Log reading block size" option to a small value, like 1KB, in the Config > Log Data -> Log Processing page of the Sawmill web interface:
Other Examples of Alerts Log Filters are written in Salang, which is a general programming language, so almost any condition is possible. You can save results from previous lines (in a node, typically), to look for alert conditions involving multiple lines; for instance, you can send an alert if there are more than 50 accesses to a particular port in the past 60 seconds (DOS attack), or in the past 1000 lines, or if there are more than 1000 different ports accessed by a particular IP in the past 60 seconds (port scanning detection). In addition, send_email() is only one possible action that can be taken by a filter. In particular, a Log Filter can run any command line using the exec() function, so for instance, it could use a firewall command line to automatically (and immediately) block access from a particular IP, when it detects that the IP is performing port scanning or a DOS attack. Questions or suggestions? Contact support@sawmill.net . If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net .
[Article revision v1.1] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
June 15, 2007
You're receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill.net . News We are currently shipping Sawmill 7.2.9. You can get it from http://sawmill.net/download.html . This issue of the Sawmill Newsletter describes the "session contains" report filter, and how it can be used to track conversions from their source. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Tracking Conversions with a "Session Contains" Filter In web analytics, a "conversion" is an event which indicates a success for the web site. If the web site is selling something, it can be a sale; if it is offering content, it can be someone viewing the content; if it is offering a subscription, it can be a visitor signing up. Most web sites have a specific purpose, tracking conversions can tell you how effectively your web site is accomplishing its purpose, and this can also lead to predicting the conversion rate. Conversions are a more practical metric than hits or page views. Page views tells you how many pages were viewed, but if your web site is selling a product, a billion page views doesn't represent success, if nobody buys. Conversions represent concrete success. Sawmill can track conversions using a flexible feature called the Session Contains filter. A Session Contains filter is a report filter, applied after the data is imported into the database, while reports are being generated. Whereas a more traditional filter might select all events from June, or all events which hit a particular page, a Session Contains filter selects entire sessions; specifically, it selects all events in all sessions which at some point accessed a particular page. For instance, if you apply a "session contains '/thanksforbuying.asp'" filter, it will select all events from all sessions containing a hit on /thanksforbuying. asp. So if a visitor entered at /index.asp, and then went to /buy.asp, and then went to /thanksforbuying.asp, all three of those
events would be selected by the filter, and would appear in all reports when that filter is active, because they are all in a session containing /thanksforbuying.asp. For example, consider a web site which provides a CGI script for image conversion. In this case, a conversion is marked by a hit on the CGI script, whose pathname is /cgi-bin/image2html.pl?(parameters). So in Sawmill's Reports interface, we click the Filter icon at the top, and in the "Session contains" section, create a new Session Contains filter to select sessions containing hits on /cgi-bin/image2html.pl?(parameters), by clicking "Add New Filter Item" and entering /cgi-bin/image2html.pl? (parameters) in the "Session contains" field of the New Session filter item window:
A Session Contains Filter Item After applying this filter, the Pages report looks like this:
The Pages Report (With Session Contains Filter) Without a filter, the Pages report would show the number of hits, page views, etc. on all pages, for the entire dataset. With the filter, this shows the number of hits, page views, etc. which occurred during sessions containing hits on /cgi-bin/image2html.pl? (parameters). For example, this shows information from only converted sessions; and it is not showing just the conversion events themselves, but all hits in the entire session containing the conversion event. This can be useful when trying to determine what causes conversions, because it shows which pages occurred in the session along with the conversion (typically, before the conversion), which might have contributed to the decision to convert. But a more interesting report is the Referrers report. Now that the "Session Contains" filter is applied, we can click any report to see that report for just converted sessions. So let's look at Referrers:
Remember, this report is based only on hits from converted sessions--this does not include any hits which were not part of a session that eventually converted. The first line is internal referrers--this site is hosted on www.flowerfire.com, so clicks from one page to another will show that as the referrer. The second line shows no referrer, which suggests that those are bookmarks--a large number of conversions are coming from people who have bookmarked the utility. The third line shows that Google has brought much of the converted traffic; further down there are other search engines, and a few sites which have linked it directly. By turning on full referrer tracking and zooming into these referrers, we could see exactly which page brought the conversions. If there are specific pages you advertise on, this report will show you which advertisements resulted in conversions, and how often. We can see from Referrers that much of our converted traffic is brought by search engines. So let's find out more, in the "Search phrase by search engine" report:
The Search Engines by Search Phrases Report (With Session Contains Filter) Again, we still have the Session Contains filter applied, so this report shows the search engines and search phrases for only
the converted sessions. This tells us then, which of our search phrases, on which search engines are bringing conversions and how many. If these are paid search terms, this is particularly useful information. Advanced Topic: Adding Numerical Fields to Count the Number of Conversions The tables above don't actually count the number of conversions; they count the number of hits and page views in the converted sessions. To count the exact number of conversions, you can look at the number of sessions, but that only appears in session reports, not in standard reports like those above. The best way to track the number of conversion is to add a custom numerical database field, and use a log filter to set it to 1 every time the conversion page is seen, e.g.:
if (page eq "/cgi-bin/image2html.pl?(parameters)") then conversions = 1 Then add the conversions field as a column to any report and to the corresponding xref tables, for best performance, and you will see the number of conversions as a column in that report. Advanced Topic: Adding Additional Numerical Fields for Revenue/Expense Analysis The reports above are useful, but you can get the ultimate report by adding additional numerical fields. For more information, see the April 2007 Newsletter: Displaying a Click-through Ratio With Calculated Report Columns, for an example of adding additional numerical fields. The exact fields and calculations depend on the situation, but suppose we're selling something, and advertising it with pay-per-click search engine ads. The table above (search engines by search phrases) can then include additional columns, called "revenue" and "expense." The "expense" column would show the total amount of money spent on each search phrase, by summing the cost for each click, perhaps using a CFG map to look up the cost; see the December 2006 Newsletter: Using CFG Maps. The "revenue" would show the total amount of revenue generated by those sessions; it would be computed by summing the revenue of each conversion, extracted from the URL if available; looked up in a CFG map of order information otherwise. This gives a clear side-by-side revenue vs. expense analysis of each paid keyword (the same approach can be used for other types of advertisements) to help determine which advertisements are making money, and which ones are not. See the Gatelys Case Study for a case study of this approach in an online retailer. Advanced Topic: Using Multiple Session Contains Filters For Scenario Analysis You can use any number of "session contains" filters at the same time. So if you want to know how many sessions went to your FAQ page, and also converted, you can add two Session Contains filters, and all reports will show only sessions which contained both pages (Session Contains filters also support wildcards, for matching a class of pages). This can be used for scenario analysis, to see how often your site is being used in the way you intended. By using several report elements with progressively more detailed "session contains", e.g., "(sessions contains 'A')" on the first report element, then "(session contains 'A') and (session contains 'B')" on the second report element and "(session contains 'A') and (session contains 'B') and (session contains 'C')" on the third report element, it is possible to create a "funnel" report, showing how many people sessions got to the first stage, second stage, and third stage of a complex scenario. Questions or suggestions? Contact support@sawmill.net . If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net .
[Article revision v1.1] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
July 15, 2007
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News We are currently shipping Sawmill 7.2.9. You can get it from http://sawmill.net/download.html . This issue of the Sawmill Newsletter describes the "session contains" report filter, and how it can be used to track conversions from their source. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Sequential Scheduling Sawmill's built-in Scheduler provides basic task scheduling capabilities. You can configure it to run a particular task, for a particular profile, at a particular time. For instance, you can configure it to update the databases for all your profiles at midnight every night, or to email yourself a Single-page Summary for a particular profile, every day at 8 AM. The Scheduler is available in the Admin page of the web interface. However, there are some restrictions on what tasks can be run simultaneously. Database builds and updates, and "remove data" tasks, modify the database, and can conflict with each other and with reports if they are run simultaneously on the same profile. Depending on the number of processors (or cores) in the system, and the speed of the disk, you may not be able to run more than a few simultaneous tasks--each task generally uses as much as a full processor (or core), so on a fourprocessor system, performance will suffer if there are more than four simultaneous processes, even if they are on different profiles. Therefore, it is often useful to run tasks sequentially rather than simultaneously. The Sawmill 7 Scheduler supports this in a few cases; you can rebuild or update databases for "all profiles," and it will rebuild or update them in sequence, starting the next task when the previous one completes (using one processor at all times). Also, some degree of sequencing is possible
by spacing the scheduled tasks so they cannot overlap; for instance, if a database update is to be followed by a report generation, and the database update takes 1 hour, then scheduling the report generation two hours after the database build will generally ensure that it is run after the update completes. But this is problematic, because the time taken for a task can never really be predicted; if the log data suddenly gets larger, or if the system slows down for some other reason, that database update might take 3 hours, and the report generation will fail. What is sometimes needed is true sequencing of arbitrary tasks, running each task when the previous completes. To perform sequencing of arbitrary tasks, it is easiest to use a script (a .BAT file on Windows), which executes the tasks with command line syntax, one after another. For instance, this .BAT file would do the database update, and then email the report: C:\Program Files\Sawmill 7\SawmillCL -p profilename -a ud C:\Program Files\Sawmill 7\SawmillCL -p profilename -a srbe -ss mail -rca me@here.com -rna you@there.com -rn overview On non-Windows systems, the script would be very similar, but with the pathname of the "sawmill" binary instead of C: \Program Files\Sawmill 7\SawmillCL . This script runs a database update of profilename, and immediately when the update completes, it emails the Overview report. Create a text file (for instance, with Notepad), and call it update_and_email.bat, and paste the two lines above into the file. On non-Windows, you might call it update_and_email.sh, and make it executable with "chmod a+x update_and_email.sh". The Sawmill Scheduler cannot run an arbitrary script, so to schedule this script it is necessary to use an external scheduler. On Windows, the Windows Scheduler is usually the best choice. Go to Control Panels, choose Scheduled Tasks, and choose Add Scheduled task. This will start the Scheduled Task Wizard. Then:
q q q q q q
Click Next to pass the introduction page. Click Browse and select the .BAT file you created above (update_and_email.bat); then click Next. Give the task a name (like "Update and Email Overview"), and click Daily; then click Next. Choose a Start time, for instance midnight, and choose Every Day, with tomorrow as the Start date; then click Next. Enter the username and password you want to run the .BAT file as (typically, yourself); then click Next. Click Finish.
Now, the .BAT file will run every day at midnight, and it will run its two tasks sequentially. Any number of tasks can be added to this script, and they will all be run sequentially, with no gap in between. On Linux, MacOS, UNIX, or other operating systems, this type of scheduling is usually done with cron, the built-in scheduler. The cron table can be edited through the graphical interface of the operating system, if one is available, or it can be edited from the command line with the command "crontab -e", adding a line like this to the cron table: 0 0 * * * /opt/sawmill/bin/update_and_email.sh >> /opt/sawmill/log/update_and_email.log 2>&1 This runs the update_and_email.sh script every day at midnight, logging the output to a file. Sawmill 8 Scheduler Features The next major release of Sawmill, version 8, will include direct support for sequential scheduling in the Sawmill Scheduler, so it will be possible to do this sort of "A then B then C etc." scheduling directly from the Sawmill Scheduler. Advanced Topic: Optimal Scheduling for Multiple Processors/Cores If you have a multiprocessor (or multi-core) system, the approach above does not take full advantage of all your processors, because the .BAT file (or script) runs on only one processor. It is possible to configure Sawmill to use multiple processors for database builds or updates (using the Log Processing Threads option), but report generation always uses one processor, and multi-processor database builds/updates are less efficient than single-processor builds (i.e., running on two processors is faster, but not twice as fast). If you have many tasks, the optimal scheduling for multiple processors is to use single-threaded builds and updates, but to keep one task running per processor at all times. For instance, if there are four processors, you start four single-threaded tasks, and as each task completes, you start another one, always ensuring that there are four tasks running. This can be done by running four scripts (or four .BAT files), like the one above, at the same time, as long as each script takes roughly the same amount of time as the others. That splits the work of the tasks into four equal pieces, and runs them simultaneously. It is also possible to write a script which does this sort of scheduling for you, and we have one, written in perl. The script,
called multisawmill.pl, is available by emailing support@sawmill.net. At this point, it is limited to only one type of task, so for instance it can run 100 database builds, split over four processors, or 1000 report generations, split over 8 processors. Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net.
[Article revision v1.1] [ClientID: 1]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
August 15, 2007
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.10 shipped on August 4, 2007. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for many new log formats, and adds a few small features. It is recommended for anyone who is experiencing problems with Sawmill 7.2.9 or earlier. You can download it from http:// sawmill.net/download.html . This issue of the Sawmill Newsletter describes using database merges to improve database build performance. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Using Database Merges Note: Database merge is available only with the internal database; it is not available for profiles that use a MySQL database. A default profile created in Sawmill uses a single processor (single core) to parse log data and build the database. This is a good choice for shared environments, where using all processors can bog down the system, but for best performance, it is best to set "log processing threads" to the number of processors, in the Log Processing options in the Config page of the profile. That will split log processing across multiple processors, improving the performance of database builds and updates by using all processors on the system. This is available with Sawmill Enterprise--non-Enterprise versions of Sawmill can only use one processor. If the dataset is too large to process in an acceptable time on a single computer, even with multiple processors, it is possible to split the processing across multiple machines. This is accomplished by building a separate database on each system, and then merging them to form a single large database. For instance, this command line adds the data from the database for
profile2 to the database for profile1: sawmill -p profile1 -a md -mdd Databases/profile2/main or on Windows: SawmillCL -p profile1 -a md -mdd Databases\profile2\main After this command completes, profile1 will show the data it showed before the command, and the data that profile2 showed before the command (profile2 will be unchanged). This makes it possible to build a database twice as fast using this sequence: sawmill -p profile1 -a bd sawmill -p profile2 -a bd sawmill -p profile1 -a md -mdd Databases/profile2/main (Use SawmillCL and \ slashes on Windows, as shown above). The critical piece is that the first two commands must run simultaneously; if you run them one after another, they will take as long as building the whole database. But on a two-processor system, they can both use a full CPU, fully using both CPUs, and running nearly twice as fast as a single build. The merge then takes some extra time, but overall this is still faster than a single-process build. Running a series of builds simultaneously can be done by opening multiple windows and running a separate build in each window, or by "backgrounding" each command before starting the next (available on UNIX and similar systems). But for a fully automated environment, this is best done with a script. The attached perl script, multisawmill.pl, can be used to build multiple databases simultaneously. You will need to modify the top of the script to match your environment, and set the number of threads; then when you run it, it will spawn many database builds simultaneously (the number you specified), and as each completes, it will start another one. This script is provided as-is, with no warranty, as a proof-of-concept of a multiplesimultaneous-build script. Using the attached script, or something like it, you can apply this approach to much larger datasets, for instance to build a year of data: 1. Create a profile for each day in the year (it is probably easiest to use Create Many Profiles to do this; see Setting Up Multiple Users in the Sawmill documentation). 2. Build all profiles, 8 at a time (or however many cores you have available). If you have multiple machines available, you can use multiple installations of Sawmill, by partitioning the profiles into multiple systems. For instance, if you have two 8-core nodes in the Sawmill cluster, you could build 16 databases at a time, or if you had four 4-core nodes in the cluster, you could build 32 databases at a time. This portion of the build can give a linear speedup, with nearly 32x faster log processing than using a single process, by using a 8-core 4-node cluster. 3. Merge all the databases. The simplest way to do this, in a 365-day example, is to run 364 merges, adding each day into the final one-year database. When the merge is done, the one-year database will function as though it had been built in a single "build database" step--but it will have taken much less time to build. Advanced Topic: Using Binary Merges The example described above uses "sequential merges" for step 3--it runs 364 separate merge steps, one after another, to create the final database. Each of these merges uses only a single processor of a single node, so this portion of the build does not use the cluster efficiently; and this can cause step 3 to take longer than step 2: the merge can be slower than the processing and building of data. To improve this, a more sophisticated merge method can be scripted, using a "binary tree" of merges to build the final database. Roughly, each code on each node is assigned two one-day databases, which they merge, forming two-day databases. Then each core of each node is assigned two two-day databases, which they merge to form a four-day database. This continues until a final merge combines two half-year databases into a one-year database. The number of merge stages is much less than the number of merges required if done sequentially. For simplicity, let's assume we're merging 16 days, on a 4-core cluster. On a 4-core cluster, we can do 4 merges at a time. Step 1, core 1: Merge day1 with day2, creating day[1,2]. Step 1, core 2: Merge day3 with day4, creating day[3,4].
Step 1, core 3: Merge day5 with day6, creating day[5,6]. Step 1, core 4: Merge day7 with day8, creating day[7,8]. When those are complete, we would continue: Step 2, core 1: Merge day9 with day10, creating day[9,10]. Step 2, core 2: Merge day11 with day12, creating day[11,12]. Step 2, core 3: Merge day13 with day14, creating day[13,14]. Step 2, core 4: Merge day14 with day16, creating day[15,16]. Now we have taken 16 databases and merged them in two steps into 8 databases. Now we merge them into four databases: Step 3, core 1: Merge day[1,2] with day[3,4], creating day[1,2,3,4]. Step 3, core 2: Merge day[5,6] with day[7,8], creating day[5,6,7,8]. Step 3, core 3: Merge day[9,10] with day[11,12], creating day[9,10,11,12]. Step 3, core 4: Merge day[13,14] with day[15,16], creating day[13,14,15,16]. Now we merge into two databases: Step 4, core 1: Merge day[1,2,3,4] with day[5,6,7,8], creating day[1,2,3,4,5,6,7,8]. Step 4, core 2: Merge day[9,10,11,12] with day[13,14,15,16], creating day[9,10,11,12,13,14,15,16]. And finally: Step 5, core 1: Merge day[1,2,3,4,5,6,7,8] with day[9,10,11,12,13,14,15,16], creating day [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]. So in 5 steps, we have build what would have required 15 steps using sequential merges: a 16-day database. This approach can be used to speed up much larger merges even more. Advanced Topic: Re-using One-Day Databases In the approach above, the one-day databases are not destroyed by the merge, which reads data from them but does not write to them. This makes it possible to keep the one-day databases for fast access to reports from a particular day. By leaving the one-day databases after the merge is complete, users will be able to select a particular database from the Profiles list, to see fast reports for just that day (a one-day database is much faster to generate reports than a 365-day database). Advanced Topic: Using Different Merge Units In the discussion above, we used one day as the unit of merge, but any unit can be used. In particular, if you are generating a database showing reports from 1000 sites, you could use a site as the unit. After building the databases from 1000 sites, you could then merge all 1000 databases to create an all-sites profile for administrative overview, leaving each of the 1000 onesite profiles to be accessed by its users. Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net.
[Article revision v1.2] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
September 15, 2007
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.10 shipped on August 4, 2007. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for many new log formats, and adds a few small features. It is recommended for anyone who is experiencing problems with Sawmill 7.2.9 or earlier. You can download it from http:// sawmill.net/download.html . This issue of the Sawmill Newsletter describes techniques for improving the performance of database updates. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Improving the Performance of Database Updates A typical installation of Sawmill updates the database for each profile nightly. Each profile points its log source to the growing dataset it is analyzing, and each night, a scheduled database update looks for new data in the log source, and adds it to the database. This works fine for most situations, but if the dataset gets very large, or has a very large number of files in it, updates can become too slow. In a typical "nightly update" environment, the updates are too slow if they have not completed by the time reports are needed in the morning. For instance, if updates start at 8pm each night, and take 12 hours, and live reports are needed between 7am and 8pm, then the update is too slow, because it will not complete until 8am, and reports will not be available from 7am to 8am. The downtime can be completely eliminated by using separate installations of Sawmill (one for reporting, and one for updating), but even then, updates can be too slow, if they take more than 24 hours to update a day of data. There are many ways of making database updates faster. This newsletter lists common approaches, and discusses each.
1. Use a local file log source If you're pulling your log data from an FTP site, or from an HTTP site, or using a command line log source, Sawmill does not have as much information to efficiently skip the data in the log files. It will need to re-download and reconsider more data than if the files are on a local disk, or a locally mounted disk. Using a local file log source will speed up updates, by allowing Sawmill to skip previously seen files faster. A "local file log source" includes mounted or shared drives, including mapped drive letters, UNC paths, NFS mounts, and AppleShare mounts; these are more efficient than FTP, HTTP, or command line log sources, for skipping previously seen data. 2. Use a local file log source on a local disk As mentioned above (1), network drives are still "local file log sources" to Sawmill, because it has full access to examine all their attributes as though they were local drives on the Sawmill server. This gives Sawmill a performance boost over FTP, HTTP, and command log sources. But with network drives, all the information still has to be pulled over the network. For better performance, use a local drive, so the network is not involved at all. For instance, on Windows, put the logs on the C: drive, or some other drive physically inside the Sawmill server. Local disk access is much faster than network access, so using local files can significantly speed updates. If the logs are initially generated on a system other than the Sawmill server, they need to be transferred to the local disk before processing, when using this approach. This can be done in a separate step, using a third-party program. rsync is a good choice, and works on all operating systems (on Windows, it can be installed as part of cygwin). On Windows, DeltaCopy is also a good choice. Most high-end FTP clients also support scheduling of transfers, and incremental transfers (transferring only files which have not been transferred earlier). The file transfers can be scheduled to run periodically during the day; unlike database updates, they can run during periods when reports must be available. 3. Turn on "Skip processed files on update by pathname" During a database update, Sawmill must determine which log data it has already imported into the database, and import the rest. By default, Sawmill does this by comparing the first few kilobytes of each file with the first few kilobytes of files which have been imported (by comparing checksums). When it encounters a file it has seen before, it checks if there is new data at the end of it, by reading through the file past the previously seen data, and resuming reading when it reaches the end of the data it has seen before. This is a very robust way of detecting previously seen data, as it allows files to be renamed, compressed, or concatenated after processing; Sawmill will still recognize them as previously-seen. However, the algorithm requires Sawmill to look through all the log data, briefly, to determine what it has seen. For very large datasets, especially datasets with many files, this can become the longest part of the update process. A solution is to skip files based on their pathnames, rather than their contents. Under Config -> Log Data -> Log Processing, there is an option "Skip processed files on update by pathname." If this option is checked, Sawmill will look only at the pathname of a file when determining if it has seen that data before. If the pathname matches the pathname of a previously processed file, Sawmill will skip the entire file. If the pathname does not match, Sawmill will process it. Skipping based on pathnames takes almost no time, so turning this option on can greatly speed updates, if the skipping step is taking a lot of the time. This will not work if any of the files in the log source are growing. If the log source is a log file which is being continually appended, Sawmill will put that log file's data into the database, and will skip that file on the next update, even though it has new data at the end now; because the pathname matches (and with this option on, only the pathname is used to determine what it new). So this option works best for datasets which appear on the disk, one complete file at a time, and where files do not gradually appear during the time when Sawmill might be updating. Typically, this option can be used by processing log data on a local disk, setting up file syncronization (see 2, above), and having it synchronize only the complete files. It can also be used if logs are compressed each day, to create daily compressed logs; then the compressed logs are complete, and can be used as the log source, and the uncompressed, growing log will be ignored because it does not end with the compression extension (e.g., .zip, .gz, or .bz2). Finally, there is another option, "Skip most recent file" (also under Config -> Log Data -> Log Processing), which looks at the modification date of each file in the log source (which works for "local file" log sources only, but remember, that includes network drives), and skips the file with the most recent modification date. This allows fast analysis of servers, like IIS, which timestamp their logs, but do not compress them or rotate them; only the most recent log is changing, and all previous days' logs are fixed, so by skipping the most recent one, we can safely skip based on pathnames. 4. Keep the new data in a separate directory For fully automated Sawmill installations, there is often a scripting environment built around Sawmill, which manages log
rotation, compression, import into the database, report generation, etc. In an environment like this, it is usually simple to handle the "previously seen data" algorithm at the master script level, by managing Sawmill's log source so it only has data which has not been imported into the database. This could be done by moving all processed logs to a "processed" location (a separate directory or folder), after each update; or it could be handled by copying the logs to be processed into a "newlogs" folder, updating using that folder as the log source, and then deleting everything in "newlogs" until the next update. By ensuring, at the master script level, that the log source contains only the new data, you can bypass Sawmill's skipping algorithm entirely, and get the best possible performance. 5. Speed up the database build, or the merge The choices above are about speeding up the "skip previously-seen data" part of a database update. But database updates have three parts: they skip the previously seen data, then build a separate database from the new data, and then merge that database into the main database. Anything that would normally speed up a database build, will speed up a database update, and it will usually speed up the merge too. For instance, deleting database fields, deleting cross-reference tables, rejecting unneeded log entries, and simplifying database fields with log filters, can all reduce the amount and complexity of data in the database, speeding up database builds and updates. With Enterprise licensing, on a system with multiple processors or cores, it is also possible to set "Log processing threads" to 2, 4, or more, in Config -> Log Data -> Log Processing. This tells Sawmill to use multiple processors or cores during the "build" portion of the database update (when it's building the separate database from the new data), which can significantly improve the performance of that portion. However, it increases the amount of work to be done in the "merge" step, so using more threads does not always result in a speed increase for updates. Active-scanning anti-virus can severely affect the performance of both builds and updates, by scanning Sawmill's database files continually as it attempts to modify them. Performance can be 10x slower in extreme cases, when active scanning is enabled. This is particularly marked on Windows systems. If you have an anti-virus product which actively scans all file system modifications, you should exclude Sawmill's installation directory, and its database directories (if separate) from the active scanning. Use of a MySQL database has its advantages, but performance is not one of them--Sawmill's internal database is at least 2x faster than MySQL for most operations, and much faster for some. Unless you need MySQL for some other reason (like to query the imported data directly with SQL, from another program; or to overcome the address space limitations of a 32-bit server), use the internal database for best performance of both rebuilds and updates. Finally, everything speeds up when you have faster hardware. A faster CPU will improve update times, and a faster disk may have an ever bigger affect. Switching from RAID 5 to RAID 10 will typically double the speed up database builds and updates, and switching from 10Krpm to 15Krpm disks can give a 20% performance boost. Adding more memory can help too, if the system is near its memory limit. Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net.
[Article revision v1.0] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
October 15, 2007
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.10 shipped on August 4, 2007. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for many new log formats, and adds a few small features. It is recommended for anyone who is experiencing problems with Sawmill 7.2.9 or earlier. You can download it from http://sawmill.net/download.html . This issue of the Sawmill Newsletter describes techniques for improving the performance of database updates. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Creating Custom Reports Sawmill creates a collection of "standard" reports when you create a profile, including the Overview, one report per non-aggregating database field, the Log Detail Report, session reports (if available), and Log Detail. But there is nothing "special" or "hard-coded" about these reports; they are just the default reports included in the profile, and you can create similar reports yourself, or edit or remove these default reports. This newsletter takes you through an example of creating a custom report. Custom reports are available only in Sawmill Professional and Sawmill Enterprise; reports cannot be customized through the web interface in Sawmill Lite. To start creating a custom report, start with a profile (create one if you don't have one, using Create Profile in the Admin page). Then: 1. Go to the Config section of the profile, either by clicking Show Config next to the profile, in the profiles list, or by clicking Config in the reports interface of the profile. 2. Then choose Manage Reports from the left menu. 3. Click Reports/Reports Editor. 4. Click New Report, in the upper-right corner of the reports list. The Report Editor appears:
5. Choose a name for a report. In this case, we'll call this "Image File Types," by typing that in the Report Name field. We'd also like to have this report be accessible through the Reports Menu (the menu at the left of the Reports interface), so we leave Add new report to reports menu checked, and give it a name for the Reports Menu (in this case, the same name: "Image File Types"):
6. Each report contains one or more report elements. A report element is generally a table of data, with associated graphs. We will create a report with one element (the Single-page summary is an example of a report with multiple elements). To do this, click the Report Elements tab:
7. Then, click New Report Element at the right, to create a new report element, and to begin editing it:
8. The Type is Overview by default, but we don't want an Overview--we want a table report (specifically, a table of File Types). So choose "Standard table" as the Report Element Type. Then enter a Report element name ("Image File Types" again). This name doesn't usually appear anywhere, but if you have multiple elements, or if you explicitly turn on the name, it will appear in the reports:
9. Now we have a table report element, but we have to tell Sawmill what fields will be in the table. This will be a table of file types, So click the Fields tab, and select "File type" from the Main field menu, and leave all the numerical fields intact:
10. We could stop now, and we have re-created the File Types report, with a different name. But we'd also like to show only image file types, and we'd like a pie chart of them. First, add the pie, by clicking the Graphs tab, clicking one of the numerical fields (Hits in this example), and making it a pie by clicking Graph type and Pie chart):
11. We're done with the Report Element, so click OK to close the Report Element editor. 12. At this point, if we stopped, the report would be a File Types report with a pie chart. But we still want to add a filter, so click back to Report Options tab in the Report Editor, and enter a report filter expression to select only the image file types (GIF and JPEG). The Salang expression to do this is "(file_type within 'GIF') or (file_type within 'JPEG') or (file_type within 'JPG') or (file_type within 'PNG')", so we enter that in the Report filter field:
13. We're done! Click Save and Close to close the Report Editor, and then click Reports in the upper left to return to the Reports interface. At the bottom of the left menu, you will see our custom report, Image File Types; click that to see the custom report:
This shows all hits on GIF, JPEG, JPG, and PNG images in this dataset; this particular dataset had no JPG or PNG hits, so it shows just GIF and JPG lines. Advanced Topics This is just a small slice of what the Reports Editor can do. * In this example, we created a standard table report, but the report editor can also create "table with subtable" reports, where each row contains an indented table, breaking down the events for that row by a second field (e.g., the "Search Engines by Search Phrases" report, in a standard web log profile). The Reports Editor can also create other specialized types of reports, like Sessions Overview, Log Detail, Session Paths, and more. * In this example, we created a single pie chart, but the Reports Editor can also create bar charts or line graphs. The charts also have many options, controlling their size, colors, legends, and more. * In this example, we created a report with a single report element; in the Report Editor, you can create reports with many report elements, like the standard Singlepage Summary. * In this example, we just left the report at the bottom of the Reports Menu, but it can be moved to the middle or the top, put into a new or existing group, or hidden entirely, using the Reports Menu Editor (click Edit Reports Menu at the top right of the Reports/Reports Menu page). Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
November 15, 2007
You're receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill.net . News Sawmill 7.2.10 shipped on August 4, 2007. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for many new log formats, and adds a few small features. It is recommended for anyone who is experiencing problems with Sawmill 7.2.9 or earlier. You can download it from http:// sawmill.net/download.html . This issue of the Sawmill Newsletter describes the Create Many Profiles feature, which can be used to greatly simplify the creation and maintenance of many similar profiles. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Using "Create Many Profiles" to Create and Maintain Many Similar Profiles In large multi-user environments, like web hosting companies, Sawmill is used to provide one profile per customer. In other situations, Sawmill is used to manage one profile per server, or one profile per device. In these and similar situations, the Sawmill installation can have hundreds or thousands of profiles, all of them very similar. Creating all these profiles manually, using the Create Profile wizard, can be very time-consuming. When a change needs to be made to all of them, it can take a very long time if it is done separately for each profile through the Config interface. The solution is to use Create Many Profiles, a feature of Sawmill which lets you create as many similar profiles as you like, in a single step. Create Many Profiles works in three stages: create a template profile, create the create_many_profiles.cfg file, and run the command to generate or regenerate the profiles. Stage 1: Create a Template Profile
The first step is to create the template profile. For this example, we'll assume that we're creating only three profiles for the web sites, site1.com, site2.com and site3.com, on a Windows server where the logs are at C:\logs\site1, C:\logs\site2 and C: \logs\site3. The template profile is usually a separate profile from any of the final profiles, so we'll call it "template". Start by creating this "template" profile, and pointing it to the log data for site1.com. This first profile is created using the Create Profile wizard (the usual way, through the web interface). Enter C:\logs\site1 as the log source for the profile. This template profile won't actually process logs or generate reports, but it needs to see some log data so it knows what the format is, so all the format-related options can be propagated to the other profiles. Finish creating the profile, and call it "template." Do not build the database or view reports; this template exists only to be a model for other profiles. Stage 2: Set up create_many_profiles.cfg Now, using a text editor like Notepad, edit the file LogAnalysisInfo\miscellaneous\create_many_profiles.cfg . This is the file which describes the profiles you want to create from the template. In this case, it might look like this:
create_many_profiles = { template_profile_name = "template" profiles = { site1 = { changes = { label = "Site 1" log.source.0.pathname = "c:\\logs\\site1" } } # site1 site2 = { changes = { label = "Site 2" log.source.0.pathname = "c:\\logs\\site2" } } # site2 site3 = { changes = { label = "Site 3" log.source.0.pathname = "c:\\logs\\site3" } } # site3 } # profiles } # create_many_profiles The parts of this file are the following: First, the whole thing is enclosed in the create_many_profiles group, which just wraps the rest of the data into a CGF file.
The following line means that the profile called "template" (the internal profile name, as it appears when you look in LogAnalysisInfo\profiles, without the .cfg extension) is used as the template to create all the other profiles.
... template_profile_name = "template" ... Within the create_many_profiles section, there is a "profiles" section, which contains information about the profiles to create:
... profiles = { ... } # profiles ... And within the "profiles" section, there are three sections, one for each profile, like this one:
... site1 = { changes = { label = "Site 1" log.source.0.pathname = "c:\\logs\\site1" } } # site1 ... The section above (site1) means that the Create Many Profiles operation should create a profile whose internal name is "site1" (e.g., it will be site1.cfg in the profiles folder), and whose label is "Site 1" (i.e., it will appear as "Site 1" in the web interface). All settings will be copied from the template profile--it will be an exact clone of the template profile, except for the settings specified here, which are the internal name, the label, and the log.source.0.pathname option. That option (log. source.0.pathname) refers to the "pathname" option in the "0" group of the "source" group of the "log" group of the profile, which, for a profile with one log source, is the pathname of the log data. (If you look in the LogAnalysisInfo\profiles\templates. cfg file, you can see this structure, with the curly-bracketed "log" group containing the "source" group, etc.) So this will create a profile based on "template", but with internal name "site1", with label "Site 1", and which reads its log data from C:\logs\site1 (the single \'s must be replaced by \\'s within CFG file string options). Similarly, the site2 and site3 sections create the Site 2 and Site 3 profiles. Stage 3: Create the profiles To create the profiles, you need to run the following command line on Windows. Open Command Prompt, use the "cd" command to change to the Sawmill installation directory, and run Sawmill with the "-dp templates.admin.profiles. create_many_profiles" option to create the profiles, like this: cd C:\Program Files\Sawmill 7 SawmillCL -dp templates.admin.profiles.create_many_profiles Or on non-Windows: cd <sawmill directory> ./sawmill -dp templates.admin.profiles.create_many_profiles When the command completes, all the specified profiles will have been created, just as though they had all been created separate using the Create Profile wizard. Now you can proceed with building databases, viewing reports, etc., for each profile. Modifying all the profiles
Now maintenance of the profiles becomes very easy. Suppose you want to add a log filter to all profiles. Just open the "template" profile in Config, edit it to add the log filter, and go to Stage 3 again, above, to recreate the profiles. Recreating the profiles does not delete or rebuild the database, so it can be safely done at any time. It can even be scheduled using an external scheduler like cron or Windows Scheduler, to recreate all profiles every night, to pick up the previous day's changes to "template", or to create_many_profiles.cfg . Adding another profile Adding another profile is also very easy. Just add a new section (e.g., site4) to create_many_profiles.cfg, and repeat Stage 3 to recreate all profiles. It won't affect the other profiles, and it will create the new profile. Automation Because the Create Many Profiles feature uses a text file and the command line, it is very easy to automate from a scripting environment. Just have your script edit, or rewrite, the create_many_profiles.cfg file to list all profiles and their modifications, and have the script run the command line to regenerate all the profiles any time something changes. Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net.
[Article revision v1.2] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
December 15, 2007
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.11 shipped on November 30, 2007. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for many new log formats, and adds a few small features. It is recommended for anyone who is experiencing problems with Sawmill 7.2.10 or earlier. You can download it from http://sawmill.net/ download.html . This issue of the Sawmill Newsletter describes the Create Many Profiles feature, which can be used to greatly simplify the creation and maintenance of many similar profiles. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Customizing the Web Interface Sawmill's default web interface looks like this:
The Standard Sawmill Reporting Web Interface The standard web interface is fine for most applications, but in situations where Sawmill is embedded as part of a larger environment, Sawmill's default interface may not align well with the rest of the interface. Fortunately, Sawmill provides many ways of customizing its user interface.
I. Customizing the Colors Sawmill's web interface is written in HTML (generated by Salang; see below), and uses Cascading Style Sheets (CSS). The CSS files are in LogAnalysisInfo\WebServerRoot\css (the css folder, in the WebServerRoot folder, in the LogAnalysisInfo folder of your installation). By editing these files, you can customize the colors, fonts, and other style attributes of the Sawmill web interface, in any way you choose. This customization is permitted for any tier of Sawmill: Enterprise, Professional, or Lite. The simplest, and most common customization of the Sawmill web interface is modification of the CSS files, to change the colors of the interface from the standard purple to some other theme. Changing the colors requires several edits to CSS files, which must be done with a text editor. The exact edits are documented in the FAQ entry Customizing the CSS in Sawmill ( http://sawmill.net/cgi-bin/sawmill7/docs/sawmill.cgi?dp+docs.faq.entry+webvars. entry+customize_css ), and may vary between versions of Sawmill. This discusses the changes required for Sawmill 7.2.11. See the FAQ of your own installation of Sawmill, for up-to-date information about customizing it. Let's change Sawmill to a blue theme. As described in the FAQ entry referenced above, the first change is to change the background color of the top bar. This is in LogAnalysisInfo\WebServerRoot\css\header.css, and we need to change this:
table.top-bar td { background-color: #31317A; white-space: nowrap; } Changing 6A317A to 31317A changes the top bar from purple to dark blue. 31317A is a CSS color code, composed of three hexadecimal one-byte components, for red, green, and blue; so in this case this is red=31, green=31, blue=7A, which is a dark blue. Many tools exist for selecting hexadecimal web colors graphically, including online tools. After making this change, if you reload, you will see the top bar of the Admin page turn purple. Now, we proceed with the remainder of the changes from the FAQ:
q q q q q q q q q q
[already done above] Color of the main header bar: background-color, in top-bar, in header.css. Color of tab in Admin: background-color, three places in admin-tab, in header.css. Color of sidebar header in Reports: background-color, in sidebar-header, in header.css. Color of Login page header bar: background-color, in top-bar, in setup_and_login.css. Color of Setup page header bar: background-color, in setup-bar and table.top-bar td, in setup_and_login.css. Color of Reports header bar: background-color, in h1.report-header-bar, in report.css. Color of Error header bar: background-color, in table.top-bar td, in error.css. Color of Progress header bar: background-color, in div.progress-title-bar, in progress.css. Color of Config header bar: background-color, in table.subform-header-bar td and table.lf-subform-label, in admin_config.css. Color of Advanced Filter form label: background-color, in advanced_filter_form_label, in report_tools.css.
With these changes, all aspects of the web interface will be blue when you reload:
The Sawmill Reporting Web Interface (Customized to Blue) II. Customizing the Logo If you're using Sawmill to deliver reports to your customers or clients, you may not want the upper left logo to say "Sawmill." If you're the owner of Bob's Web Hosting Company, you might want to have the logo show your logo, and "Bob's Reports."
Assuming you have a license which allows modification of the logo of your installation (see below), the logo can be modified by replacing the transparent PNG file at LogAnalysisInfo\WebServerRoot\picts\sawmill_logo.png . Once you have replaced that file with another PNG file (ideally, with transparency, to let the header bar background color show through), reload the web browser page, and the new logo should appear. WARNING: Customizing the logo is a violation of standard licensing, as described in the End User License Agreement (EULA) for Sawmill. Regardless of which tier you use (Lite, Professional, or Enterprise), you may not remove or change the Sawmill logo without written permission from Flowerfire. If you want to change this logo, you must contact Flowerfire for a license which allows it. III. Renaming the Product Beyond changing the logo, you can also modify Sawmill to change the name of the product from "Sawmill" to something else (like "Bob's Reports"). This is a simple one-line change to a single text file, to change the name of the product in all locations in the web interface, the command line, and the documentation. Exact instructions for changing the product name are available through Flowerfire, if you have a license which permits it. WARNING: Renaming the product is a violation of the standard licensing, as described in the EULA. Permission to rename must be negotiated separately with Flowerfire. WARNING: The Flowerfire copyright notice, which appears at the bottom of every page in the web interface, may never be changed or removed. IV. Advanced Customization of the Web Interface If you have Enterprise licensing, there is almost no limit to the customization you can do, provided you stay within the terms of the EULA (don't change the logo, the product name, or the copyright notice, unless you have written permission). That's because Sawmill's web interface is written entirely in Salang, the Sawmill Language, and the source code of the web interface is contained in the LogAnalysisInfo\templates folder. In an Enterprise installation, the source code is open to be modified using a text editor. Salang is an interpreted language with some similarity to perl and C, and an experienced web programmer can modify the Salang source code to change anything in the web interface. If you want to change the location of components of the GUI, add new functionality, or completely change the way existing functionality works, you can do it by developing in Salang. However, the templates cannot be changed in Professional or Lite installations of Sawmill; if templates are changed, Sawmill will cease to function, giving an error when the template is accessed, until it is restored to its original content. If you make significant modifications to the templates, you should seriously consider using a version control system. Each release of Sawmill, including minor releases, includes some changes to the templates, and if the changes affect files you have modified, you will have to choose between getting the new changes, keeping your own changes, or re-patching your changes into the new versions of the files. The last option, which is the best one, is made trivial by a version control system, which will allow you to monitor changes to the basic Sawmill distribution, and overlay your own changes over the distribution with each release. This can also be done with a diffand-patch approach, but version control is generally more flexible.
Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net.
[Article revision v1.0] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
January 15, 2008
You're receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill.net . News Sawmill 7.2.11 shipped on November 30, 2007. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for many new log formats, and adds a few small features. It is recommended for anyone who is experiencing problems with Sawmill 7.2.10 or earlier. You can download it from http:// sawmill.net/download.html . This issue of the Sawmill Newsletter describes how to limit the disk usage of a database by creating a rolling 30-day database, using "remove database data" feature of the Scheduler, and/or using a Log Filter. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Creating A Rolling 30-day Database Typically, a profile in Sawmill imports data from a growing log source, periodically (most often, daily). This is simple, and fine for many purposes, but as the size of the log data increases, so does the size of the database, and this will eventually consume all available disk space. For smaller datasets, the time to consume all disk space may be so long that it causes no problem (if your disk will fill up in 350 years, it's probably not a pressing issue for you), but if the dataset is very large, it may be necessary to restrict the size of Sawmill's database. There are various ways of reducing the size of the database, but one of the simplest is to restrict data to a certain age. If a database covers only the past 30 days, it will be about 1/10th the size of a database covering the past 300 days (assuming no growth in daily data size). In this article, we will discuss ways to create a database which always shows the past 30 days of data. The same techniques can be used with any other age, for instance to create a 90-day database, or a 60-day database. If database updates are scheduled to occur daily, then each day, the latest day of data will be added to the database. That's
the easy part--the hard part is getting rid of the oldest day of data (the 31st day of data). There are two ways to do this: with a "remove database data" action, or by rebuilding the database with a log filter. Using "Remove Database Data" To Discard Data Older Than 30 Days From An Existing Database The "Remove Database Data" action removes data matching a certain filter from an existing database. It is most often used to remove data older than a certain number of days, and the Scheduler has an easy option for using it this way. This section describes how to set up a scheduled task to remove data older than 30 days from the database, every night at midnight. In the Admin page of the web interface, click Scheduler, to look at the scheduled tasks, and then click New Action in the upper right to create a new action. Choose "Remove database data" for the action; choose the profile name for the profile you want to limit, from the "Profile" menu; and enter "30" in the "Remove database data older than" field. It will look like this:
Now click Save and Close to save the task. From now on, at midnight, all data older than 30 days will be removed from the database for the selected profile. (If you want it to occur more frequently, or less frequently, or at another time of day, you can change it at the bottom of the window). Using A Log Filter To Reject Data Older Than 30 Days During Log Processing If the data is not yet in the database (e.g., if you're rebuilding the database), you can also remove it as you process it, using a Log Filter. This section describes creating a log filter to reject log data older than 30 days. Go to Config -> Log Data -> Log Filters, and click New Log Filter in the upper right. Click the Filter tab, and give it a name in the Name field like, "Remove data older than 30 days." Click New Condition, and set up the following condition, to detect log data older than 30 days:
Click OK, and click New Action, and set up the following action, to reject the log data detected by the condition above:
Click OK, and use the Sort Filters tab to move this Log Filter to the top of the list, by clicking Sort Filters, and then clicking the Up button until the new filter is at the top. It will probably work fine at the bottom, but it's faster to have it at the top, and it could give different results if there is a filter higher in the list which accepts log entries (which is rare). The final filter should look like this:
From now on, whenever you rebuild the database for this profile, this filter will reject all log entries older than 30 days. This means that you could actually do a nightly rebuild, instead of a nightly update together with a nightly "remove database data," to maintain a rolling 30-day database. This is reasonable unless that dataset is too large to rebuild nightly. Database updates are much faster than rebuilds, and "remove database data" operations are faster than rebuilds (though not that much faster, since they still have to rebuild all xref tables and indices), so you'll generally get better performance with an update+remove every night, versus a rebuild. Advanced Topic: Removing Data From The Command Line It is also possible to run a "remove database data" action at any time from the command line, using the "-a rdd" option. Any report filter can be specified with the -f option, to remove all events matching that filter. The following command removes all data older than 30 days, from the database for the profile profilename: SawmillCL -p profilename -a rdd -f "(date_time < now() - 60*60*24*30)" (on non-Windows systems, use the name of the Sawmill binary, e.g. "sawmill", instead of "SawmillCL").
Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement this, or another customization, contact consulting@sawmill.net.
[Article revision v1.1] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
February 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill. net . News Sawmill 7.2.12 is expected to ship today, February 15, 2008, or early in the coming week (we are working on some final issues, which may delay shipment until next week). This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for many new log formats, and adds a few small features. It is recommended for anyone who is experiencing problems with Sawmill 7.2.11 or earlier. You can download it from http://sawmill.net/ download.html . If it still shows 7.2.11 as the latest download, wait a few days and come back; 7.2.12 should be there by then. Sawmill 7.2.11 had a performance issue which caused table reports to generate slowly when the "show parenthesized items" option was turned on, when using the internal database. This affected most reports generated by 7.2.11, in profiles using the internal database, and the effect was particularly pronounced for large databases. If you're using 7.2.11, and seeing slow performance while generating table reports, consider an upgrade to 7.2.12. Sawmill 7.2.11 had a bug in the Flash Media Server plug-in which caused incorrect results in bandwidth columns. If you're analyzing Flash Media Server, with a profile created by 7.2.11, you should upgrade to 7.2.12, and recreate the profile, to get correct results. This issue of the Sawmill Newsletter describes how use the "zoom" feature in reports to dig into the details of a dataset. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Using Zoom To Get More Detail When you first view reports in Sawmill, you will see the Reports Menu along the left of the page, with a list of a few dozen available reports. These are unzoomed reports, showing the "top ten" of each field in the database. For instance, in a media server analysis, you might see the top IP addresses in one report, with bandwidth and viewing time for each, or you might see the top publishing points, with bandwidth and viewing time for each. These are valuable reports, and a good starting point for any investigation, but
Sawmill can give a lot more detail than you see in those reports. Because the power and flexibility of "zoom" is not always obvious, this newsletter is dedicated to describing how "zoom" works, and to exploring some of the more advanced zoom options. For this example, we will use a web server dataset from 1998, for a web site which publishes reviews of novels. Clicking Pages/ directories, we see the top-level directories and pages of the site:
The Pages/directories Report The top one, /seized/, is the main directory of the site, we we click on this. Clicking on it zooms, which means it applies a filter to the dataset, and possibly switches to a different report (we'll discuss this report switch more below). In this case, it filters the data to show only hits on the /seized/ directory, or files in it. When zooming on a hierarchical field like the "page" field, the default behavior is to stay in the same report, so Sawmill zooms to /seized/, and redisplays the Pages/directories report with this zoom filter, shown in yellow. The result of this is that it feels a lot like zooming into the folder structure on a hard drive--you click the folder name to see what's in it, and here we've clicked the directory name to see the contents:
The Pages/directories Report, Zoomed On "/seized/" In this web site, the reviews themselves are in the /seized/reviews/ directory, so we click that to zoom in another step, to see the contents of /seized/reviews/:
The Pages/directories Report, Zoomed On "/seized/reviews/" There are 122 items in this table (only the top ten are shown), and if we looked further, we would see that most of them are review pages (the name of the novel, followed by the name of the reviewer). Rows 8, 9, and 10 here show the most popular reviews (ignoring Practical Magic for the moment), with 120, 126, and 155 page views, and if we looked further, we would see a smooth drop from that, toward the less popular reviews. But way out ahead of the pack is /seized/reviews/practical_magic_sara_lipowitz. html, the review of Practical Magic. This is an anomaly--this review has many times more page views than any other review. Why? Sawmill can help you find the answer. Let's start by zooming on that review, by clicking the second row. Since that's a filename, we can't zoom further in the Pages/directories report, so Sawmill automatically zooms to the Overview instead:
The Overview, Zoomed On "/seized/reviews/practical_magic_sara_lipowitz.html" This isn't very interesting by itself (the data here is mostly the same as the row of the Pages/directories table), but it is a good staging point for further investigations. The key to further zooming is the "Zoom to report" menu, which appears below the yellow zoom description. We can select any report from that menu, and it will display that report, while preserving the zoom. This is different from what happens if we click the report in the Reports Menu, because that discards the zoom and goes back to the toplevel report. By using the "Zoom to report" menu, we can break down the data on any field, finding out more about this particular subset of the data. Let's start by selecting Days from the "Zoom to report" menu. This shows the Days report, subject to the current filter:
The Days Report, For Practical Magic This report shows traffic on just that one file, day-by-day. The graph at the top shows that there was a large spike of traffic in midOctober, 1998. Before that, traffic on this novel review was very light; after that, it was much higher. So what happened in midOctober 1998? A little web research shows that was the release date (October 16, 1998) of the movie version of the novel Practical Magic. With the movie's release, the novel got much higher exposure than before, which sparked a sharp interest in the review. If the goal of this web site is the bring the maximum number of page views, then, this gives a clear recommendation for which novels to review: review those which are being made into movies. Sawmill's detailed analysis can give similar information for any web site, information which can be used to make the site more effective, or more popular. Just for the sake of demonstration, let's do a little more digging. From "Zoom to report", select the "Domain descriptions" report. This shows the domain descriptions where traffic came from, to the review (again, we're still zoomed in on just this one review page, so we're seeing a very specific report: domain descriptions for the hits on Practical Magic):
Domain Descriptions For Practical Magic Much of the traffic was from .net, .com, .edu addresses, and IP addresses. But somewhat surprisingly, there are some hits from Singapore (*.sg hostnames). Let's look deeper, by clicking "Singapore (sg)", and zoom to the Hostnames report:
Hostnames from Singapore, For Practical Magic This shows a list of all hostnames of the browsers who accessed this page from Singapore. Now, let's zoom on milkyway.singnet. com.sg, to see the specific events from that hostname. But this time, we'll save some time by clicking the Zoom Options tab, and selecting "Log detail" below it (this is usually faster because it takes several seconds to generate the Overview, but no time at all to display Zoom Options):
Hostnames from Singapore, For Practical Magic, With Log Detail Zoom That indicates that we don't want to zoom to the Overview (the default), and then zoom from there to "Log details"--instead, we want to zoom directly to "Log detail." So now when we click milkyway.singnet.com.sg, we go straight to the "Log detail" report, and see full details of those four page views from milkyway.singnet.com.sg, on /seized/reviews/practical_magic_sara_lipowitz.html, including the exact time of each hit, the referrer, and more (additional fields have been truncated to fit here, but all database fields are in this report):
Log Detail For Practical Magic, from milkyway.singnet.com.sg This type of deep forensic analysis is useful for any type of log data. Any number of zooms can be applied simultaneously, and can be used in conjunction with other types of filters, including date range filters and global filters. Zooming can continue on any number of fields, to any level, including the level of the events themselves, in "Log detail." Advanced Topic: Changing The Default Zoom For A Report In the example above, we zoomed by clicking on an item to zoom to Overview, and then selecting a report to zoom to. Later, we saved some time by using the Zoom Options tab in the report. But if we know we'll usually be zooming from Report A to Report B, we can modify Report A so its default zoom is to Report B, rather than to the Overview. This effectively changes the Zoom Options menu selection, so we don't have to do it manually if we just want to zoom to Report B. For instance, we could change the zoom default on the "Domain descriptions" report to zoom to the "Hostnames" report, so any time we click a domain, we'll see a list of hostnames under that domain. This is done by going to the Config page for the profile, then going to Manage Reports, then Reports/Reports Menu, then clicking the report name, clicking the Report Elements tab, clicking Edit to edit the report element, and choosing the destination report from the Default report on zoom menu. So after changing the default report on zoom to Hostname in the "Domain descriptions" report, the report element editor page would look like this:
Changing Default Report On Zoom To "Hostnames" Save the change, and in the future, any click in the "Domain descriptions" report will zoom to the "Hostnames" report. Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement a customization, contact consulting@sawmill.net.
[Article revision v1.1] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
March 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.13 shipped on February 21, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for many new log formats, and adds a few small features. It is recommended for anyone who is experiencing problems with Sawmill 7.2.11 or earlier. You can download it from http://sawmill.net/ download.html . This issue of the Sawmill Newsletter describes how to use log filters to exclude your own traffic from a web site analysis. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Excluding Your Own Traffic With Log Filters If you have a web site, you probably visit it yourself, frequently. You might want to ensure that it is functioning properly (through an automated system, or manually), or you might be looking for information on your own site, or there might be portions of your own site that are private and for your use only. When you analyze your web site with Sawmill, your own traffic will normally appear in the reports. If the purpose of the analysis is to determine how other people use your site, your own traffic can be a distraction, or can skew the statistics. Therefore, it is often best to exclude your own traffic, when doing an analysis with Sawmill, so the reports show only external traffic. At first, it might seem that the way to do this is using a Report filter, by excluding your own IP address using the Filters icon at the top of the reports. That will work, but it is usually not the best way. Report filters do their work while the reports are being generated, after the logs have been imported into the database, and therefore slow down report generation. Reports with complex filters usually require a full table scan, which is a slower report generating method, whereas unfiltered reports can be generated much faster, because they can be generated from cross-reference tables. So if you intend always to discard your own traffic, you should do it with a Log Filter, so you can use unfiltered reports, and get better performance. Log Filters run while the log data is being imported, so they do not affect reporting speed; and a simple log filter doesn't hurt import speed much.
Note: because this technique requires Log Filters, this cannot be done with Sawmill Lite. Using A Log Filter To Discard Your IP Address For now, we'll assume that your traffic always comes from a particular IP address. Maybe it's the IP address of your firewall, or maybe it's the IP address of your own computer. The first step is to find out what IP address is yours. If you're behind a firewall, the IP address of your computer will not be the IP address that web servers see. If you're not sure, one way is to go to whatsmyip.com, which will display your IP address as it appears to web servers on the Internet. Let's assume that your IP address is 12.34.56.78. Now we'll create a log filter to reject all events from that IP. First, click View Config next to the profile name, in the Admin page, to go to the Config section of your profile: There, click Log Data and then Log Filters, in the left menu, to see the current list of log filters. Click New Log Filter, in the upper right of the log filters list, to create a new log filter, and type a name for it, like "Reject my IP address":
Now, click the Filter tab, and click New Condition to set up the condition for the filter. The condition is met when the IP address is 12.34.56.78, so choose the IP address field from the Log field menu (this will vary depending on log format; it will be "Hostname" for Apache logs, and "Client IP" for IIS logs), choose "is equal" from the Operator menu, and enter 12.34.56.78 in the Value field:
Click OK, and click Add Action to add the action this filter is to take when the condition is met. The action should be to reject this entry (we're rejecting all entries from your IP address), so choose "Reject log entry" from the Action menu:
Click OK. At this point, you could click Save and Close--the log filter is done. But for performance reasons, and to ensure no earlier filters explicitly accept the events you're trying to reject, it's best to have rejecting filters at the top of the list of log filters, so before you save this, move it to the top by clicking Sort Filters, and clicking Up [ + ] until the new filter is at the top:
Now click Save and Close, and the log filter will be saved permanently to the profile. If you view reports now, you won't see any change, because log filters only have an effect while importing log data. So it is now necessary to rebuild the database, which you can do by clicking Rebuild Database at the top of the Config page. After you rebuild the database, all reports will show only hits from IP addresses other than 12.34.56.78. Advanced Topic: More Sophisticated Filters The simple example above rejects only a single IP address. That is sufficient for some purposes, but if your internal traffic is not always from the same IP, you will need a more complex filter. Log Filters of any complexity can be created in the Log Filter Editor. For instance, if your internal traffic is from any of 12.34.56.76, 12.34.56.77 or 12.34.56.78, you can create a filter with three conditions (by clicking New Condition three times) to reject hits from all three IPs:
This same filter could be done using a regular expression, by choosing "Matches regular expression" from the Operator menu in the Condition page, creating a filter which uses a regular expression to determine which IP addresses to reject:
By customizing the regular expression, you can reject any class of IP address which can be described by regular expressions (which are very flexible). The same filter could also be implemented with even more flexibility, by choosing "Advanced expression syntax" from the Filter Type menu. This allows you to enter an arbitrary Salang expression. Salang is the "Sawmill Language," and allows fully general programming language syntax, including conditions, loops, variables, subroutines, and more. It's overkill for this example, but is useful for complex ranges, and for conditions which involve more than just the IP address. The three-IP filter could be implemented as an IP range in Salang using this expression:
Questions or suggestions? Contact support@sawmill.net. If would you like a Sawmill Professional Services expert to implement a customization, contact consulting@sawmill.net.
[Article revision v1.0] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
April 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.14 shipped on March 26, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for several new log formats, and adds a few small features (mostly notably, a new "Save To Menu" button in the reports, and the "Use Overview for totals" option discussed below). It is recommended for anyone who is experiencing problems or slow reports with Sawmill 7.2.13 or earlier. You can download it from http://sawmill.net/download.html . This issue of the Sawmill Newsletter describes the new "Use Overview for totals" option, showing its affects on reports, and discussing the performance ramifications. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Showing Correct Totals and Percents for Unique Values Important Note: The feature described in this article has been disabled by default in Sawmill 7.2.14 for performance reasons. It can be very useful, but it also causes the display of reports to be somewhat slower than when it is turned off. With some types of log data, and with some configuration options, the display of reports can be painfully slow. So, it is recommended that you use this option only when it is really needed. Calculated report columns were introduced in Sawmill 7.2.6 and described in the Sawmill Newsletter for April 2007. This article uses the same example. Please refer to that newsletter for a more complete explanation of the example and calculated report columns and how to set them up. This article will use the same example and explain the differences in the set up that allow totals for calculated report columns to show in the Total row. Suppose you have a report like this one:
The calculated column is CTR, which stands for "Click Through Rate." The dashes in the Total row indicates that these totals are not available. To see them, you must turn on the report display option "Use Overview for totals." To do this, go to Config > Manage Reports -> General Display/Output -> Edit General Display/Output for the profile, check "Use Overview for totals" and click "Save and Close."
You can also add it in the .cfg file for the profile by searching first for "statistics = {", then "miscellaneous = {" to find the statistics miscellaneous group and adding use_overview_for_totals = "true" anywhere in that group (or changing the value from false to true, if it is already there). The reason the feature is called "User Overview for totals" is that the values in the Total row (which are also used as the denominators for percents) come from the Overview report, with all active filters applied. With "User Overview for totals" turned on, the report will look like this:
The total in the Unique Users column is the actual total of users associated with advertisements, not the sum of the columns. This will be explained more fully in the next section. The total in the CTR column does not add up to the total of the percents in the rows because it is calculated from values in the Total row. Note: It also doesn't add up to 100% as percent columns usually do because it isn't a percent column as far as Sawmill is concerned, but a calculated column formatted as a percent. Regular percent columns add up to 100% because the denominator used for the percent is the sum of the values in the associated numeric column. In addition to turning on "Use Overview for totals," one change needs to be made to the way the calculated column is set up in the .cfg file for the profile. The April 2007 Newsletter said to add this field to the end of the database fields section of the profile:
ctr = { label = "CTR" log_field = "ctr" type = "string" display_format_type = "string" expression = `((1.0 * cell_by_name(row_number, 'Clicks')) / cell_by_name(row_number, 'Impressions')) * 100.0` } # ctr In order for this calculated column to work correctly in Sawmill 7.2.14, whether "User Overview for totals" is on or off, the field must look like this. The type and display_format_type must be float. The cell names must match the database field names, which means they must be lowercase. The file that is edited is the .cfg file for the profile, which is in LogAnalysisInfo/ profiles.
ctr = { label = "CTR" log_field = "ctr" type = "float" display_format_type = "float" expression = "((1.0 * cell_by_name(row_number, 'clicks')) / cell_by_name(row_number, 'impressions')) * 100.0" } # ctr The column that is added to the advertisement report is the same.
ctr = { data_type = "unique" display_format_type = "%0.2f%%" field_name = "ctr" header_label = "CTR" show_bar_column = "false" show_graph = "true" show_number_column = "true" show_percent_column = "false" type = "number" visible = "true" } # ctr This is where the display_format_type must be specified. The only restriction is that it must be a numeric type. Examples of other possible values for display_format_type for other calculations might be %0d%%, %0.4f, %0.6f%%, bandwidth, duration_compact, duration_microseconds, duration_milliseconds, float and integer. The named types are explained in the Custom Formats section of the Technical Manual. When "User Overview for totals" is turned on, the data_type can be unique or float. When it is turned off, if it is float, the calculated column will simply be summed, like this:
This probably is not appropriate for many calculated columns. Setting the data_type to unique simply tricks Sawmill into replacing the total with a dash, as it does with actual columns of unique values, which gives you the table in the first example screenshot.
Showing Correct Totals and Percents for Unique Values If you have a report with a percent column associated with a numeric column of type unique, the only way to see meaningful percents is to turn on "User Overview for totals." Turning it on also causes the correct total to be shown in the Total row. Consider this example:
If you have been using Sawmill before version 7.2.9, you are used to seeing a dash in the Total row for columns with unique values. The unique data type means that if the same value occurs more than once, such as the same IP address or user ID, the value is only counted once. Dashes were displayed because the only option available for the Total row was to show the sums of the columns, which is meaningless because there may be overlap between the unique values represented by the count, that is, the same user ID might be counted in more than one row.
In this example there are percent columns associated with the unique values, and the percents are based on the sum of the values in the corresponding column rather than on the total number of unique users subjected to each rule, which is misleading. How misleading this is depends on the amount of overlap between the groups of users. The percent was not suppressed with a dash because percent columns are rarely used with a column of type unique. When "User Overview for totals" is turned on, the report looks like this:
Note that though the percents in the columns don't add up to 100%, the totals are 100%. That's because 100% of the users in the "Failed unique users" column failed a rule. Here is a look at the same log data presented in a different way:
Here the columns in the subtotal are associated with Unique Users instead of with Unique Failed Users, so the percents are based on the total number of unique users. But the percents and the Subtotals are based on the total in the Total row, that is, all unique users, not the subset of unique users subjected to the particular rule. A future version of Sawmill may make the denominator of the percents in the percent column configurable, with the Total and the Subtotal as two of the options. Until then, it would be a good option to suppress the Subtotal row if "User Overview for totals" is used with a report like this one. With "User Overview for totals" turned off, this is what the same report looks like:
Again, note that the percentages are based on the sums of the values in the Unique Users column, in this case the subtotals.
Working Around Performance Problems The main reason the report display is slow when adding these values to the Total row is the special internal filter which eliminates rows with parenthesized values, which has to be applied again when the filtered overview is retrieved. Empty database fields are set to the special value "(empty)," and many other parenthesized values appear in reports, including the "(no search engine)" value in the Search Engines reports, and similar values in other reports. These parenthesized items are removed by default, to improve report legibility, and to make percentages and pie charts more meaningful. When "show parenthesized items" is turned on, empty fields will show up in reports like this:
You can see in this example that the numeric field Impressions has a value for Advertisement "(empty)" and it changes the total for CTR from 10.20% to 9.62%. In some reports it may not be possible to overcome this distortion, but in this example, it is possible to change the filter that counts impressions. Suppose for the report above, impressions are counted with a log filter like this:
filters = { count_impressions = { value = "impressions = 1;" label = "Count Impressions" comment = "" } # count_impressions } # filters This filter assumes that every event in the log represents an advertising impression. You can change the filter to take into account the value of the advertisement field and the clicks field, like this:
filters = { count_impressions = { value = "if (advertisement ne '(empty)' or clicks > 0) then impressions = 1;" label = "Count Impressions" comment = "It's only an impression if there was an advertisement or a click." } # count_impressions } # filters This results in a report where the row "(empty)" is still there, but it doesn't distort the total because both impressions and clicks are zero.
Because this is done in a log filter, the value of impressions will have already been set to "(empty)" and the fields are accessible directly by name. In a parsing filter, the syntax may be more complicated. Again, such a manipulation may not make sense in all contexts and you may have to choose between performance and not having to worry about the effects of including "(empty)" in the report. To turn on "Show parenthesized items" for any report through the Sawmill interface, follow these steps: 1. 2. 3. 4. 5. 6. Go to Config -> Manage Reports -> Reports/Reports Menu. Click on the name of the report you want to change. Click the Report Elements tab. Click Edit for the report element you need to change (usually there is only one report element). Click the Table tab. Check the checkbox for "Show parenthesized items."
You can also make this change by editing the .cfg file for the profile. Within the report_element for the report you want to change, find omit_parenthesized_items and change the value to false.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
May 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.14 shipped on March 26, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is not a critical update, but it does fix a number of bugs, adds support for several new log formats, and adds a few small features (mostly notably, a new "Save To Menu" button in the reports, and the "Use Overview for totals" option discussed below). It is recommended for anyone who is experiencing problems or slow reports with Sawmill 7.2.13 or earlier. You can download it from http://sawmill.net/download.html . This issue of the Sawmill Newsletter describes the new CFGA file format, which can be used to selectively override CFG files (log formats, profiles, or any other CFG files). Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Using CFGA Files Incrementally Override CFG Files Important Note: The feature described in this article was introduced in Sawmill 7.2.11, and will not work in earlier versions. Sawmill uses its CFG format (CFG stands for "ConFiguration Group," as it is usually a group of configuration options or values) in most situations where it needs to store formatted, hierarchical, textual data on disk. These include profiles (in LogAnalysisInfo\profiles), log format plug-ins (in LogAnalysisInfo\log_formats); the files used to match spiders, search engines, and worms (spiders.cfg, worms.cfg, and search_engines.cfg, all in LogAnalysisInfo); preferences; language modules (LogAnalysisInfo\languages), and more. Simple uses of Sawmill do not require direct editing or viewing of CFG files at all, but advanced uses often require editing profiles, creating or editing log format plug-ins, and performing other CFG file edits.
Editing CFG files works fine until you upgrade to a new release of Sawmill. At that point, if you've edited search_engines.cfg to add a new search engine to Sawmill's detection, you'll have a choice: keep your edited search_engines.cfg, or use the latest one. If you use the latest one, you'll get whatever is new in Sawmill; but you'll lose your own edits, and have to re-do them in the new version of search_engines.cfg. If you keep yours, you'll keep your edits, but you won't get anything that's new, including new search engines added, or bugs fixed, as part of the Sawmill upgrade.
CFGA Files Are Patches Applied To CFG Files The solution is to use CFGA files, instead of editing CFG files. CFGA files are "ConFiguration Group Addition," so called because they add new information (or modify existing information) to a CFG file, from Sawmill's perspective. They allow you to edit the contents of a CFG file as Sawmill sees it, without actually editing the original text file. This lets you add a new search engine, for instance, without actually editing search_engines.cfg. To use CFGA files, create a file next to a CFG file, with the same name except for the CFGA extension. In that file, use the same structure as the CFG file (the same internal groups), and any options you select will automatically be patched into the CFG file whenever Sawmill uses it.
Example: Adding A Search Engine To search_engines.cfg For example, suppose search_engines.cfg looks like this (this is a simplified version of the usual search_engines.cfg, which contains many more entries):
search_engines = { yahoo = { name = "Yahoo" substring = "yahoo." regexp = "yahoo\\.[^/]+/.*[&?]p=([^&]*)" } # yahoo lycos = { name = "Lycos" substring = "lycos.com" regexp = "lycos\\.[^/]+/.*[&?]query=([^&]*)" } # lycos google2 = { name = "Google" substring = "google." regexp = "google\\.[^/]*/.*[&?]q=([^&]*)" } # google } # search_engines
search_engines.cfg
Now, suppose your favorite search engine isn't there. You want to add MSN Search. You could do it by adding it to search_engines.cfg directly, like this:
search_engines = { yahoo = { name = "Yahoo" substring = "yahoo." regexp = "yahoo\\.[^/]+/.*[&?]p=([^&]*)" } # yahoo lycos = { name = "Lycos" substring = "lycos.com" regexp = "lycos\\.[^/]+/.*[&?]query=([^&]*)" } # lycos google2 = { name = "Google" substring = "google." regexp = "google\\.[^/]*/.*[&?]q=([^&]*)" } # google msn_search = { name = "MSN Search" substring = "search.msn." regexp = "search\\.msn\\.[^/]*/.*[&?]q=([^&]*)" } # msn_search } # search_engines
That would work--you'd see "MSN Search" in your Search Engines report. But the next time you updated to the newest Sawmill, your change would be overwritten by the newest search_engines.cfg. A better solution is to create a file in LogAnalysisInfo (next to search_engines.cfg) called search_engines.cfga:
search_engines = { msn_search = { name = "MSN Search" substring = "search.msn." regexp = "search\\.msn\\.[^/]*/.*[&?]q=([^&]*)" } # msn_search } # search_engines
search_engines.cfga
When Sawmill goes to look at search_engines.cfg, it will automatically add the information from search_engines.cfga, as it reads it, and the effect will be the same as if you had modified search_engines.cfg. On the next update, search_engines.cfg will be overwritten, but your modifications will not be lost, because they are in search_engines.cfga, which is not overwritten (because it is not part of the standard distribution of Sawmill).
Example 2: Adding A Log Filter To A Log Format Plug-in Suppose you always want to discard spider traffic from your Apache logs. You could do this by adding a log filter like this one, each time you create a profile, to the log_filters section (log.filters):
(... beginning of file omitted ...) reject_spiders = `if (spider ne '(not a spider)') then 'reject';` (... end of file omitted ...) Modification to profile CFG (partial) But that would affect only that profile; it would have to be re-done for any future profiles. So a better solution is to add that filter to the apache_combined.cfg plug-in, in the same place as you would add it in the profile (log.filters). That would add the filter to all future profiles created for that log format. But the plug-in change would be overwritten when you upgrade to a new release of Sawmill, so you would have to re-do the plug-in edit after each upgrade. So the best solution is to create a new file, in LogAnalysis\log_formats, called apache_combined.cfga (next to the apache_combined.cfg log format plug-in file), which contains this:
apache_combined = { log.filters.reject_spiders = `if (spider ne '(not a spider)') then 'reject';` } # apache_combined apache_combined.cfga This has the same effect as adding that line to apache_combined.cfg--it adds the filter to the bottom of the list of any new profile created from that plug-in--but when you upgrade to a new version, the plug-in change will not be overwritten. So this has to be done only once, and will add that filter to all future profiles for that format, for all future upgrades of Sawmill. Other Uses Of CFGA Files This can also be used to:
q q q q q
Override or add final_step in a log format plug-in, to make any other changes desired to the profile Package up a set of desired changes to a profile, and turn the changes on or off by moving or renaming the CFGA file Add or modify the spiders, worms, server responses, or other lists Modify the language modules in LogAnalysisInfo\languages Anything else that would normally require a CFG edit
All these changes will survive upgrading, and can be moved to different installations to apply the changes there, without editing CFG files.
[Article revision v1.0] [ClientID: 46]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
June 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.15 shipped on May 16, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is recommended for anyone who is experiencing problems or slow reports with Sawmill 7.2.14 or earlier. You can download it from http://sawmill.net/download.html . This issue of the Sawmill Newsletter describes techniques for showing usernames in reports, rather than IP addresses. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Showing Usernames, Instead of IP Addresses, in Reports When using Sawmill to generate reports from a forward proxy server, you will often want to know what person was responsible for a particular bit of traffic. Almost all proxy servers log the IP address of the internal computer, but some do not log the username, or actual name, of the person using that IP address, at the time of the access. This article discusses ways to determine the person responsible for the traffic. Method 1: Use a proxy server which logs the username Ideally, the proxy server should simply log the username on each line, showing which user was responsible for the access. Technically, that can be accomplished by having the proxy server query a local LDAP or Open Directory server, to determine the username associated with the IP address at the time of the request. This is the optimal solution, because the proxy server is in the ideal position to do the query. All other solutions must be done after the fact, when the IP address may no longer correspond to the same username in the authentication server. So if your proxy server logs the username, you're done-Sawmill will report it. If your proxy server is not currently logging the username, see if it can be configured to compute and log
the username at the time of the event. If it cannot, contact the proxy server vendor to see if such a function can be enabled, or can be added. Method 2: Use a CFG file to map IP addresses to usernames In the real world, your proxy server may not log usernames, and you may not have any way to make it log usernames. A simple alternative, though not usually as accurate, is to give Sawmill a CFG file which includes a list of all IP addresses, and the usernames they map to (see the December 2006 newsletter for a discussion of creating and using CFG files). Once you have the CFG file, Sawmill will be able to tag every line of log data with the username (based on the IP address in that line), and you'll get a Usernames report and field, just as though the device had logged the username. This is perfect if your IP-tousername mappings never change, and can be automated using a simple script to query the LDAP server and generate the CFG file. Method 3: Use a timestamped CFG file to map IP addresses to usernames based on time Method 2 works if IP-to-username mappings never change, but in many environments they do. You might have a DHCP environment where multiple users share the same IP addresses; or you might have shared workstations where one person might be sitting at a particular workstation one day, and another person might be sitting there another day. A simple IP-tousername map cannot capture this complexity, and you would need to choose which single username corresponds to each IP. If the environment is largely stable, that could still be fairly effective, but it won't work well for environments where most of the IPs are dynamic. If the IPs are highly dynamic, you can use a CFG file which contains timestamp information for each IP-to-username mapping. The CFG file could include information about which time range a particular mapping was valid, and the log filter which converts IPs to usernames could use the timestamp of the log line to choose which username to use. In the simplest form, the CFG file could just have multiple dumps of the authentication server, each one under a timestamped subnode in the CFG file, and the log filter could find the closest dump to the log line's timestamp, and use the mappings from there. If the authentication dumps are frequent enough, this can approach the accuracy of Method 1. However, the additional overhead of managing the timestamps and the much larger CFG file can make log processing much slower with this approach, versus Method 2 (and with Method 1, there is no overhead at all). More sophisticated layouts of the CFG file are also possible, including methods which eliminate redundancy by using a hierarchical structure. The precise details of the possible structures of the CFG file, and the log filter which parses it, are beyond the scope of this article, but if you need assistance implementing this approach, we can help you; please contact consulting@flowerfire.com. Method 4: Query the authentication server directly from the log filter This method dispenses with CFG entirely, instead having the log filter query the authentication server each time it needs to know the username associated with a particular IP address. Salang (the language of log filters) does not have direct support for querying authentication servers, so this must be done by running an arbitrary command (with exec()) to do the lookup. Furthermore, because it takes a significant fraction of a second to exec() a script to do the lookup, this cannot efficiently run for every line of log data, so an additional level of caching (using an in-memory node) should be added to make it fast. This method (4) suffers from a possible accuracy problem similar to Method 2: it uses the IP-to-username mapping as of the log processing time, which may not be the same as the mapping at the time of the access. So it is effective for environments where users do not typically move between IP addresses, but will give incorrect usernames for some events in some cases, in environments where users move. The details of the authentication query script is beyond the scope of this article (in a nutshell, it could accept the IP address on the command line, query the authentication server, and write the username to a file to be read and cached by the log filter with read_file()), but if you need assistance implementing this approach, we can help you; please contact consulting@flowerfire.com. Summary For perfect auditing (certain determination of which username was responsible for an access), Method 1 is the best, as it determines the username at the time of access. Method 3 can approach the accuracy of Method 1, but is still not perfectly precise, due to the high granularity of authentication server dumps; it can be made perfect if dumps can be arranged to occur
with every change in the authentication database. Method 2 is a simple and fast approach, suitable for environments where IPs are generally stable, or where exact IP-to-username correlations are not required. Method 4 is similar to Method 2, but does its work in real time--it eliminates the need for a CFG file, but introduces the need for a querying script.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
July 15, 2008
You're receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of "UNSUBSCRIBE" to newsletter@sawmill.net . News SAWMILL SALE! For July only, all Sawmill upgrades are half price. Upgrade your Sawmill 6 (or earlier) license to Sawmill 7, for 50% of the usual price; or upgrade your existing Sawmill 7 installation to a higher tier (e.g., Professional to Enterprise) or to a larger number of profiles, for half the usual cost. Orders in our online store will automatically show the discount; or you may claim the discount on any purchase order. Hurry; this offer ends July 31! Sawmill 7.2.15 shipped on May 16, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is recommended for anyone who is experiencing problems with Sawmill 7.2.14 or earlier. You can download it from http:// sawmill.net/download.html . Sawmill 8 has entered "alpha" stage. That means that we have completed development of all features, and Sawmill 8 will ship (8.0.0) when all known bugs are fixed. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and rolebased authentication control. A "beta" version will be publicly available when all major bugs are fixed, probably in the next few weeks. Watch sawmill.net for the beta announcement! This issue of the Sawmill Newsletter describes techniques for adding Sawmill users (users with access to the Sawmill web interface) automatically, using command-line scripting. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Adding Users Automatically With Salang Scripting Sawmill's "Create Many Profiles" feature makes it easy to add new profiles automatically, by embedding Sawmill in a larger environment, and calling the "create many profiles" Salang script to generate or regenerate many similar profiles from a single
template. See the November 15, 2007 Sawmill Newsletter for a discussion of Create Many Profiles. But what if you want to have a separate user for each profile, and want that to be created automatically too? This might be useful in a web hosting environment, where you want each user to be able to log into Sawmill to see only the reports for their own domains; so for each customer, you would create a Sawmill user, and then give it permission to access its own domains (profiles), and finally give the customer a direct link to the Sawmill web interface (or, including Sawmill as a "tab" in a larger interface). This can be done using Salang scripting. This newsletter includes an example script, and how to use it. The Add User Script Here is a script which adds a user (or modifies an existing user):
# This script adds a non-administrative user with access to one profile. # # Usage: sawmill -dp miscellaneous.add_user v.username <username> v.password <password> v. profile <profile> {= # Create the user node for v.username (in the "users" node, i.e., the users.cfg file in LogAnalysisInfo), # if it doesn't already exist set_subnode_value('users', v.username, ''); # Set the username to v.username set_subnode_value('users.' . v.username, 'username', v.username); # Set the password. This computes the MD5 checksum of v.password, and puts it in the password_checksum of the user node set_subnode_value('users.' . v.username, 'password_checksum', md5_digest(v.password) ); # Make this a non-administrative user set_subnode_value('users.' . v.username, 'administrator', false); # Add a "profiles" subnode in the user record, where we can list the profile accessible to this user. set_subnode_value('users.' . v.username, 'profiles', ''); # Give the user access to one profile: the one specified by v.profile set_subnode_value('users.' . v.username . '.profiles', v.profile, true); # Save the users node (users.cfg) save_node('users'); # Display the new (or modified) users node echo(node_as_string('users.' . v.username)); =} LogAnalysisInfo/miscellaneous/add_user.cfv To use this script, put it in the miscellaneous folder, which is in the LogAnalysisInfo folder of your Sawmill installation. Then, from the command line, run this on Windows (assuming Sawmill is installed at C:\Program Files\Sawmill 7): C: cd Program Files\Sawmill 7 SawmillCL -dp miscellaneous.add_user v.username username v.password password v.profile profile Or on other operating systems, run this command line:
cd sawmill-install-dir ./sawmill -dp miscellaneous.add_user v.username username v.password password v.profile profile (./sawmill may need to be qualified with the version number, e.g., ./sawmill7.2.15). That will create a new user whose username is username, whose password is password, who is a non-administrator, and who has access to one profile, called profile. The profile parameter must be the internal name of the profile, i.e., the name of the file as it appears in LogAnalysisInfo\profiles, without the .cfg extension. So, if your have a profile CFG file in LogAnalysisInfo \profiles\jennys_site.cfg, you would use "v.profile jennys_side" in the command line. By embedding a call to this script in a larger environment, you can automatically create a username every time you add a new customer. By calling it again with the same username, you can change their password, or add access to new profiles. Advanced Topic: Understanding The Script You don't need to understand the script to use it, but if you want to modify it, or do some Salang scripting of your own, you'll need to know how it works, and why. This section dissects the script in detail, to explain what each piece does. This section assumes familiarity with computer programming. The Salang section of the Sawmill Technical Manual (click Help in the upper right of any Sawmill installation) provides a more technical and complete description of Salang. 1. The {= and =} tags. The {= before the script and =} after the scripts are used in a CFV ("configuration value") file to indicate a section of Salang code. Without these tags, the entire CFV file is treated as a literal string. When {= and =} are present, the section between them is compiled and executed, and its result is inserted in the resulting string. In this case, we're not using the result string at all--we just want to have an effect when we run the script--so the entire script is embedded in {= =}. 2. Comments start with # All lines beginning with # are comments, and are ignored by Salang. These are for documentation purposes only, and have no effect on the code. 3. set_subnode_value('users', v.username, ''); The set_subnode_value() function in Salang sets a value within a node. A node is a general Salang data structure, which is similar to a perl hash (or, to a lesser degree, a C structure). Unlike perl hashes or C structures, however, Salang nodes can reside in memory, or on disk, or both. Referring to a node by the name 'users' indicates that it is a top-level node called 'users'; and because there is a file called "users.cfg" in LogAnalysisInfo, Sawmill automatically equates the two, and this function operates on the node whose content is described by the file users.cfg. Because that is the file which contains Sawmill User information, this line operates directly on the user information file. The contents of the file is loaded into memory automatically, when the node is referenced, and the modifications are made to the in-memory copy. The changes are not saved to disk until save_node() is called, below. So, this operates on the "users" node, and in this case it is setting the value of a subnode. The name of the subnode is the value of v.username, which is a variable specified by the v.username command-line parameter. v.username is also a node: it is the "username" subnode of the top-level "v" node, which does not have an ondisk counterpart, so remains in memory. The "v" node is often used for temporary variables, but has no particular significance to Salang--it could have been called x.username, as long as both the command line and the script called it that. The last parameter to set_subnode_value(), which specifies the value to assign to the subnode, is empty. So this line sets the subnode whose name is in the variable v.username to "". If the value v.username is "jenny", then a subnode "jenny" will be created in "users" (users.cfg), and set to empty. This creates a new user record, called "jenny." By the way, this code uses single-quotes ('), but double-quotes (") and backticks (`) are all treated identically by Salang. So the script would work the same if all single quotes were double quotes, or if they were all backticks. Assuming there were no users before this line ran, the "users" node would look like this after this line:
users = { jenny = "" } # users So, the user record has been created, but has no values in it yet. Note that the file users.cfg has not yet been modified, and won't be until save_node() is called, below. 4. set_subnode_value('users.' . v.username, 'username', v.username); This is similar to #2, above. But the first parameter uses the concatenation operator (.) to concatentate the value of the v.username variable to the literal string "users.". If the value of v.username was "jenny", the concatenation would be "users.jenny", so that is the node we are operating on. In node names (like "users. jenny"), a dot is a hierarchy divider, so "users.jenny" refers to the subnode "jenny" of the node "users". So this line sets the subnode "username" of the node "users.jenny" to the value of v.username. This adds a "username" parameter, with value "jenny", to the user record for "jenny". The "users" node would look like this after this line:
users = { jenny = { username = "jenny" } # jenny } # users 5. set_subnode_value('users.' . v.username, 'password_checksum', md5_digest(v. password) ); As with #3, this sets a subnode of the user node (users.jenny). In this case, it's setting the password_checksum node. For security, the password is not stored plain-text in the users node, so it is first encoded using the builtin function md5_digest(), before being written to the password_checksum node. The "users" node would look like this after this line:
users = { jenny = { username = "jenny" password_checksum = "4ed9407630eb1000c0f6b63842defa7d" } # jenny } # users 6. set_subnode_value('users.' . v.username, 'administrator', false); As with #3 and #4, this sets a subnode of the user node (users.jenny). Here, it sets the "administrator" node to false, indicating that this user is not an administrator. The "users" node would look like this after this line:
users = { jenny = { username = "jenny" password_checksum = "4ed9407630eb1000c0f6b63842defa7d" administrator = false } # jenny } # users 7. set_subnode_value('users.' . v.username, 'profiles', ''); As with #3, #4, and #5 this sets a subnode of the user node. Here, it creates a "profiles" subnode of the users
node. This node is empty at first, but can be filled with one or more profile names, indicating which profiles the user may access. The "users" node would look like this after this line:
users = { jenny = { username = "jenny" password_checksum = "4ed9407630eb1000c0f6b63842defa7d" administrator = false profiles = "" } # jenny } # users 8. set_subnode_value('users.' . v.username . '.profiles', v.profile, true); This sets a subnode of the "profiles" node created in step 6. It uses the concatenation operator to concatenate "users.", the value of v.username, and ".profiles"; if v.profile is "jenny" (as above), this string is "users.jenny. profiles", which points to the "profiles" subnode of the "jenny" subnode of the "users" node (which is users.cfg in LogAnalysisInfo). This operates on the subnode specified by the value of v.profile. The v.profile value is specified on the command line, so for this example, we'll assume it is "jennys_site". This subnode does not exist, so it is created, and its value is set to true (the third parameter above). The "users" node would look like this after this line:
users = { jenny = { username = "jenny" password_checksum = "4ed9407630eb1000c0f6b63842defa7d" administrator = false profiles = { jennys_site = true } # profiles } # jenny } # users 9. save_node('users'); This saves the node 'users' to its natural position on disk, which is the users.cfg file in LogAnalysisInfo. The content of users.cfg is replaced by the "users" node shown in the box above. After this line, the user modification is complete, and Sawmill will immediately allow logins by the new user and there is no need to restart the service. 10. echo(node_as_string('users.' . v.username)); This displays to console (standard output) the contents of the subnode specified by v.username, in the "users" node. In the example above, it would display this to console:
jenny = { username = "jenny" password_checksum = "4ed9407630eb1000c0f6b63842defa7d" administrator = false profiles = { jennys_site = true } # profiles } # jenny
This provides some feedback of what the script did, allowing you to verify the new or modified user record.
Conclusion This newsletter presented a simple Salang script, which performs a useful operation. Salang is a fully general programming language, which can be used to do any type of scripting. Sawmill's entire web interface is written in Salang, as are log filters, parsing filters, and many other major components. If you are a programmer, or have a programmer available, you can use Salang to greatly extend the functionality of Sawmill, and to implement your own features in Sawmill. If you would like assistance in maintaining this script, or in creating scripts for your own, you can also use Sawmill Professional Services. Our experts have a thorough knowledge of Salang, and can create any type of scripts for you. Contact sales@sawmill.net for more information.
[Article revision v1.2] [ClientID: 46]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
August 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Come see us at Streaming Media West! We will be at Streaming Media West in San Jose, CA, September 23-25. This year's event is The Business & Technology of Online Video. See our newest version of Sawmill, we will be giving away t-shirts and have special offers during the show. Hope to see you there! Sawmill 7.2.15 shipped on May 16, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is recommended for anyone who is experiencing problems with Sawmill 7.2.14 or earlier. You can download it from http://sawmill.net/download.html . Sawmill 8 is well into "alpha" stage. That means that we have completed development of all features, and Sawmill 8 will ship (8.0.0) when all known bugs are fixed. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. A "beta" version will be publicly available when all major bugs are fixed, probably in the next few weeks. Watch sawmill.net for the beta announcement! This issue of the Sawmill Newsletter describes using Sawmill to import data into a MySQL database, and to query it directly with external SQL queries. Get the Most out of Sawmill with Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire. com.
Tips & Techniques: Using Sawmill To Query Log Data With SQL Sawmill Enterprise supports the use of a MySQL server as the back-end database (this is available only with Enterprise--Sawmill Professional and Sawmill Lite cannot use MySQL). [Note: Sawmill has been certified against the commercial version of MySQL server, and it is recommended that Sawmill users purchase the full MySQL Server, rather than using the "community" database server.] Sawmill's own built-in "internal" database is faster than MySQL, and is therefore generally a better choice if reports are to be generated by Sawmill. But MySQL has several major advantages over the internal database, including the ability to run SQL queries. This newsletter discusses the use of SQL to extract information from a MySQL database which was created by Sawmill. The techniques described in this newsletter can be used to run arbitrary SQL queries against log data in any of the 800+ log formats Sawmill supports; other log formats can be supported through custom log format plug-ins.
Creating A MySQL Profile And Database To create a Sawmill profile which uses MySQL as its back-end database, create a profile just as you would for the internal database, but on the Database Options page of the Create Profile Wizard, select "Use MySQL database" (again, this is available only in Sawmill Enterprise). Then enter the hostname, username, and password of your MySQL database server. If you're using a socket to communicate with MySQL, and the socket is not in the default location, you can also enter its pathname on this page. Finally, if you want the name of the database (schema) to be different from the name of the profile, you can enter a different database name on this page:
Then continue with creating the profile, and at the end, let Sawmill build the database. It is not necessary to view reports through Sawmill now (or ever), if you intend to use the database only through external SQL queries--the "build database" operation will populate all tables in the MySQL database. When the "build database" completes, Sawmill has parsed, normalized, and inserted all log data into the MySQL database, and built the associated itemnum tables (see below).
Querying The Main Table And The Itemnum Tables The main table of the database is called "logfile," and contains one row for each event in the log data. In this example, we are analyzing a small 5000-line Apache log file, so each line of log data is a separate event; so the resulting logfile contains 5000 rows (all examples below are captures from the "mysql" command-line program. If your mail client does not support the HTML tags used to render this in a mono-spaced font, these may format poorly; changing the font to a mono-spaced font like Courier or Monaco will show them better):
mysql> select count(*) from logfile; +----------+ | count(*) | +----------+ | 5000 | +----------+ Let's look at one row from logfile:
mysql> select * from logfile limit 1; +-----------+---------------------+-----------------+-------------+-------------+----------+-----+-----------+------+-------------------+--------------+----------+--------------------+---------+--------------+------+--------+----------+----------------------+---------------+--------------+-------------+------------------+--------+---------------+--------------------+-----------------+-----+------------+---------+-------+--------+--------------+------------------+----------+------+ | loadorder | date_time | bottomleveldate | day_of_week | hour_of_day | hit_type | page | file_type | worm | screen_dimensions | screen_depth | hostname | domain_description | location | organization | isp | domain | referrer | referrer_description | search_engine | search_phrase | web_browser | operating_system | spider | server_domain | authenticated_user | server_response | hits | page_views | spiders | worms | errors | broken_links | screen_info_hits | visitors | size | +-----------+---------------------+-----------------+-------------+-------------+----------+-----+-----------+------+-------------------+--------------+----------+--------------------+---------+--------------+------+--------+----------+----------------------+---------------+--------------+-------------+------------------+--------+---------------+--------------------+-----------------+-----+------------+---------+-------+--------+--------------+------------------+----------+------+ | 1 | 1998-04-07 16:53:06 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 734 | +-----------+---------------------+-----------------+-------------+-------------+----------+-----+-----------+------+-------------------+--------------+----------+--------------------+---------+--------------+------+--------+----------+----------------------+---------------+--------------+-------------+------------------+--------+---------------+--------------------+-----------------+-----+------------+---------+-------+--------+--------------+------------------+----------+------+ This corresponds to the first line of the log data, which is this: 140.177.203.25 - - [07/Apr/1998:16:53:06 -0500] "GET / HTTP/1.0" 200 734 "-" "Mozilla/4.04 [en] (X11; I; SunOS 5.6 sun4u)" The date_time clearly matches the log data, and the 734 in the size column matches the 734 in the log data, but the rest of the columns are not so clear. That's because all non-aggregating (non-numerical) fields are normalized in logfile; instead of being directly included in logfile, their values are included in auxiliary tables (the itemnum tables), and references to those values are included in logfile. So for instance, hostname is 2 in logfile, which corresponds to 140.177.203.25 in the hostname itemnum table (the table called hostnameitemnum, which has columns called itemnum and hostname, and is used to map itemnums to hostname, and vice versa). Querying the hostnameitemnum table with SQL, and selecting only the row where itemnum=2, shows the correllation:
mysql> select * from hostnameitemnum where itemnum = 2; +---------+----------------+ | itemnum | hostname | +---------+----------------+ | 2 | 140.177.203.25 | +---------+----------------+
Joining The Main Table (logfile) To The Itemnums Tables By joining one or more itemnum tables to the main table, you can get results similar to those shown in Sawmill's own reports. For instance, let's generate a "top 10 hostnames" report. We can do that by selecting/summing logfile, grouping on hostname, joining in hostnameitemnum to get real hostnames in the result (rather than normalized hostname itemnums), ordering by hits descending, and limiting to the top 10:
mysql> select i.hostname, sum(hits) as hits, sum(page_views) as page_views, sum(size) as size from logfile l left join hostnameitemnum i on l.hostname = i.itemnum group by hostname order by hits desc limit 10; +-----------------------------+------+------------+----------+ | hostname | hits | page_views | size | +-----------------------------+------+------------+----------+ | lipowitz.isdn.uiuc.edu | 692 | 236 | 6348398 | | 192.17.19.150 | 466 | 46 | 6363739 | | 192.17.19.148 | 317 | 136 | 24279573 | | pale.kai.com | 308 | 70 | 5289858 | | flowerfire.isdn.uiuc.edu | 242 | 52 | 2337963 | | spider.unh.edu | 171 | 19 | 433709 | | 195.101.37.244 | 87 | 21 | 575602 | | 206.148.222.50 | 79 | 25 | 876475 | | isdn-5.nii.enterconnect.net | 79 | 16 | 579010 | | gli2302.ctea.com | 75 | 9 | 436318 | +-----------------------------+------+------------+----------+
Any of Sawmill's standard table reports can be generated similarly using SQL.
Filtering By Itemnum Now suppose we want to filter the result. Any type of filter is possible using a WHERE clause in the SQL query. In this case, we'll filter the report above to show only .com hostnames. By including "where i.hostname like '%.com'", the query now selects only those rows from logfile which contain hostname values ending with .com, when de-normalized. So, the resulting list contains all .com addresses:
mysql> select i.hostname, sum(hits) as hits, sum(page_views) as page_views, sum(size) as size from logfile l left join hostnameitemnum i on l.hostname = i.itemnum where i.hostname like '%.com' group by hostname order by hits desc limit 10; +--------------------------------+------+------------+---------+ | hostname | hits | page_views | size | +--------------------------------+------+------------+---------+ | pale.kai.com | 308 | 70 | 5289858 | | gli2302.ctea.com | 75 | 9 | 436318 | | wat.thedj.com | 57 | 22 | 767059 | | tosainu.trimark.com | 45 | 12 | 341875 | | gianduia.compecon.com | 39 | 6 | 206124 | | ip252.ts4.phx.inficad.com | 34 | 4 | 160689 | | h-205-217-240-156.netscape.com | 33 | 4 | 173806 | | clarendon.weblogic.com | 32 | 4 | 151206 | | gw.vixel.com | 30 | 4 | 115755 | | cx51617-a.dnpt1.occa.home.com | 30 | 4 | 150578 | +--------------------------------+------+------------+---------+
The filter doesn't have to work on only the primary field of the query (hostname); it can work on any fields in logfile, or the joined fields of any other table. MySQL supports many joins in a single query, so we can join in one table for the main column (hostname), and also join additional tables for filtering. For instance, here are the hostnames which accessed GIF images (the hits column shows how many GIF accesses each hostname had; the size column shows how many GIF bytes each hostname transferred):
mysql> select i.hostname, sum(hits) as hits, sum(page_views) as page_views, sum(size) as size from logfile l left join hostnameitemnum i on l.hostname = i.itemnum left join file_typeitemnum fi on l. file_type = fi.itemnum where fi.file_type = 'GIF' group by hostname order by hits desc limit 10; +-----------------------------+------+------------+--------+ | hostname | hits | page_views | size | +-----------------------------+------+------------+--------+ | lipowitz.isdn.uiuc.edu | 432 | 0 | 447331 | | 192.17.19.150 | 420 | 0 | 209925 | | pale.kai.com | 235 | 0 | 144966 | | 192.17.19.148 | 161 | 0 | 322118 | | spider.unh.edu | 152 | 0 | 129566 | | flowerfire.isdn.uiuc.edu | 151 | 0 | 433550 | | gli2302.ctea.com | 66 | 0 | 56660 | | 195.101.37.244 | 65 | 0 | 87507 | | isdn-5.nii.enterconnect.net | 63 | 0 | 24508 | | 206.148.222.50 | 52 | 0 | 74631 | +-----------------------------+------+------------+--------+ 10 rows in set (0.03 sec)
Conclusion This newsletter describes the simple process for creating a MySQL profile in Sawmill, and using it to import log data into a SQL database. This process can be used to import any log data into a MySQL database. Sawmill already supports all common log formats (800+ different formats as of this writing), and if a particular format isn't on Sawmill's list, it can be added by creating a log format plug-in. This makes it possible to use Sawmill to run arbitrary SQL queries against any textual log data, by (1) creating a Sawmill profile from the log data with MySQL as the backend database, (2) building the database in Sawmill, and (3) running SQL queries against the resulting database. If you would like assistance in creating a plug-in for a log format you would like to query with SQL, or if you would like assistance creating SQL queries or scripts to extract the information you need from a Sawmill MySQL database, you can also use Sawmill Professional Services. Our experts have a thorough knowledge of Sawmill, log format plug-ins, and MySQL. Contact sales@sawmill.net for more information.
[Article revision v1.0] [ClientID: ]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
September 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News: Come See Us At Streaming Media West! We will be at Streaming Media West in San Jose, CA, September 23-25. This year's event is The Business & Technology of Online Video. See our newest version of Sawmilll; we will be giving away t-shirts and have special offers during the show. Hope to see you there! Sawmill 7.2.15 shipped on May 16, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is recommended for anyone who is experiencing problems with Sawmill 7.2.14 or earlier. You can download it from http:// sawmill.net/download.html . Sawmill 8 is well into "alpha" stage. That means that we have completed development of all features, and Sawmill 8 will ship (8.0.0) when all known bugs are fixed. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and rolebased authentication control. A "beta" version will be publicly available when all major bugs are fixed, probably in the next few weeks. Watch sawmill.net for the beta announcement! This issue of the Sawmill Newsletter describes using Sawmill to parse a custom log format, by creating a log format plug-in. Get The Most Out Of Sawmill With Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Using Sawmill To Report On Custom Log Data Sawmill is a universal log analyzer, that can parse and report on any type of textual log data. As of this writing, Sawmill supports 850 different common log formats, from the full range of popular devices: web servers, media servers, mail servers, firewalls, gateways, etc. But Sawmill's log analytic capabilities don't stop at the 850 included formats; it can analyze any log file. The 850 formats are implemented using log format plug-ins, small text files which describe the layout of the log data, the
fields, the filtering it should applies to the data, and the reports to be generated. These log format plug-ins are user-editable, and user-creatable, so to support a custom log format, all you need to do is create your own plug-in. This newsletter gives an example of creating a plug-in too. The specific log data is an internal format generated by a perl script to analyze disk usage, compression, and number of lines in a log dataset, but again, this same approach can be used to analyze any textual log data. The Log Generator The script which generates the logs is shown below. Our Professional Services team created this script on a customer site, when we needed to know how much disk space was used, and how many lines of log data there were, in a very large multifile compressed dataset. This was on a UNIX system, so we could have done it with "du", or with "gunzip -c" combined with "find" and/or "wc", but that would have given us information for only one directory, without any ability to zoom in to see information by directory, or by date; and we would not have been able to filter. Importing the data into Sawmill allows very high granularity in examining the data, filtering by date or other criteria. So, we wrote this script to compute, for each file in a directory (recursively), the file size, uncompressed file size (for gzipped logs), and number of lines in the file:
#!/usr/bin/perl use strict; my $usage = "compute_size_data.pl <pathname>"; my $pathname = $ARGV[0]; if ($pathname eq "") { print "Usage: $usage\n"; exit(-1); } my $findcmd = "find $pathname -type f"; open(FIND, "$findcmd|") || die("Can't run $findcmd: $!"); while(<FIND>) { my $foundpathname = $_; chomp($foundpathname); my $filesize = -s $foundpathname; my $uncompressedsize = $filesize; my $lines = 0; if ($foundpathname =~ /[.]gz$/) { $uncompressedsize = `gunzip -l $foundpathname | fgrep % | sed -e 's/^ *[0-9][0-9]* * \\([0-9][0-9]*\\) .*\\/\\1/'`; chomp($uncompressedsize); $lines = `gunzip -c $foundpathname | wc -l`; chomp($lines); } else { $lines = `wc -l $foundpathname`; chomp($lines); } print "pathname=$foundpathname|size=$filesize|uncompressedsize=$uncompressedsize|lines= $lines\n"; } The Log Generator Script: compute_size_data.cfg The details of the script are beyond the scope of this article, but the output is, this script generates log data like this:
pathname=/logs/12345/log_12345.200806292100-2200-0.log.gz|size=542192| uncompressedsize=4046692|lines=7883 pathname=/logs/12345/log_12345.200808172000-2100-0.log.gz|size=667984| uncompressedsize=5331102|lines=11740 pathname=/logs/12345/log_12345.200806131300-1400-0.log.gz|size=380606| uncompressedsize=2970825|lines=5608 pathname=/logs/12345/log_12345.200805222000-2100-0.log.gz|size=589198| uncompressedsize=4567431|lines=8284 pathname=/logs/12345/log_12345.200803252100-2200-0.log.gz|size=691357| uncompressedsize=6072894|lines=12695 pathname=/logs/12346/log_12346.200803012200-2300-0.log.gz|size=513444| uncompressedsize=3881224|lines=7514 pathname=/logs/12346/log_12346.200805101400-1500-0.log.gz|size=322774| uncompressedsize=2501874|lines=4937 pathname=/logs/12346/log_12346.200712311800-1900-0.log.gz|size=461202| uncompressedsize=3422076|lines=6165 pathname=/logs/12346/log_12346.200806270700-0800-0.log.gz|size=105324| uncompressedsize=813253|lines=1807 pathname=/logs/12346/log_12346.200803172000-2100-0.log.gz|size=751699| uncompressedsize=5731115|lines=10523 The first line, for instance, means that there is a file /logs/12345/log_12345.200806292100-2200-0.log.gz, which is 542,192 bytes in size, or 4,046,692 compressed, and is 7,883 lines in length. In this case, this is the log file for customer 12345, generated on June 29, 2008, which is for the period 21:00 - 22:00. The Log Format Plug-in So now we have a chunk of log data, and we want to analyze it with Sawmill to answer questions like:
q q q q q q q q
What is the total compressed size of the log data, in bytes? What is the total uncompressed size of the log data, in bytes? How many lines are there in the log data? How many lines are there for May 22, 2008? What is the total uncompressed size of log data for the customer 12346? How much compressed log data has been generated between 3PM and 4PM, for all customers combined? How much compressed log data has been generated between 3PM and 4PM, for customer 12347? And so on--once the data is in Sawmill, any type of reporting and filtering is possible.
In order to import this custom data into Sawmill, we need to create a log format plug-in. The plug-in below recognizes and parses this format of log data. Each section of the plug-in will be described separately below.
compute_size_data = { plugin_version = "1.0" # The name of the log format log.format.format_label = "compute_size_data.pl Log Format" log.miscellaneous.log_data_type = "other" log.miscellaneous.log_format_type = "other" # The log is in this format if any of the first ten lines match this regular expression log.format.autodetect_regular_expression = "^pathname=.*uncompressedsize=" # Log fields log.fields = { date = "" time = "" pathname = {
type = "page" hierarchy_dividers = "/" left_to_right = true leading_divider = "true" } # pathname size = "" uncompressed_size = "" lines = "" files = "" } # log.fields # Database fields database.fields = { date_time = "" day_of_week = "" hour_of_day = "" pathname = { suppress_bottom = 99999 } } # database.fields database.numerical_fields = { files = { default = true } lines = { default = true } size = { type = "float" default = true display_format_type = "bandwidth" } # size uncompressed_size = { label = "uncompressed size" type = "float" default = true display_format_type = "bandwidth" } # uncompressed_size } # database.numerical_fields log.parsing_filters.parse = ` if (matches_regular_expression(current_log_line(), '^pathname=([^|]+)[|]size=([0-9]+)[|] uncompressedsize=([0-9]+)[|]lines=([0-9]+)')) then ( # Add an entry which reports total usage by all files pathname = $1; size = $2; uncompressed_size = $3; lines = $4; files = 1; if (matches_regular_expression(pathname, '[.]([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][09])([0-9][0-9])([0-9][0-9])-')) then (
date = $1 . '-' . $2 . '-' . $3; time = $4 . ':' . $5 . ':00'; ); ); # if matches line ` create_profile_wizard_options = { # The reports menu report_groups = { date_time_group = "" } # report_groups } # create_profile_wizard_options } # compute_size_data
The Log Format plug-in: compute_size_data.cfg If you put this plug-in in the LogAnalysisInfo/log_formats folder of your Sawmill installation, then Create Profile using the log data above, Sawmill will recognize the data as "compute_size_data.pl Log Format". Sawmill will then generate fully filterable and zoomable reports, with numerical fields size, uncompressed size, and lines, and reports for date, time, and pathnames and directories. The Log Format Plug-in, Dissected In this section we will take the log format plug-in, one part at a time, describing what each part does. The Log Format Plug-in: The Header A log format plug-in starts with a header like this:
compute_size_data = { plugin_version = "1.0" # The name of the log format log.format.format_label = "compute_size_data.pl Log Format" log.miscellaneous.log_data_type = "other" log.miscellaneous.log_format_type = "other" The first line has the internal name of the plug-in, in this case, "compute_size_data". The name of the file must be the same as this, with a .cfg extension, so this file must be called compute_size_data.cfg. Plug-ins are "nodes" in Sawmill terminology (like most of Sawmill's configuration text files), so they use curly brackets ( { } ) for grouping. The first line shows that this plugin is called compute_size_data, and the "= {" indicates that it is a group node, with multiple parameters below it. The entire remainder of the file is information within this node, describing the plug-in; and the final line of the file contains "}" to close the "compute_size_data" node. plugin_version is an optional parameter which is useful for tracking multiple versions of a plug-in. The line beginning with # is a comment. Everything after the #, until the end of the line, is ignored by Sawmill, and has no effect on the functionality of the plug-in. The next three lines give the label of the plug-in as it will appear in the Create Profile Wizard, and categories for the plug-in which determine where they are listed in the documentation. The category options do not affect the functionality of the plug-in, and can safely be left as "other."
The Log Format Plug-in: The Autodetection Regular Expression The next line is the autodetection regular expression:
# The log is in this format if any of the first ten lines match this regular expression log.format.autodetect_regular_expression = "^pathname=.*uncompressedsize=" This is a regular expression (See Regular Expressions in the Sawmill documentation, or look it up in a search engine, to learn regular expression syntax) which describes what the log data looks like, for the purposes of detecting it. When a profile is first created, Sawmill will detect the format by comparing the first few lines of the log data with this expression. If any line matches, the format will be listed as a matching format in the Create Profile wizard. In this case, the regular expression means that any line starting with "pathname=", and then containing "uncompressedsize=" later in the line, is considered to match this format. The autodetect regular expression should be as tightly focused as possible, so it detects every line of the format, but is very unlikely to match a line of any other format; this ensures that the Create Profile Wizard shows the format, and no other formats, when it autodetects. The Log Format Plug-in: The Log Fields The log format continues with the log fields:
# Log fields log.fields = { date = "" time = "" pathname = { type = "page" hierarchy_dividers = "/" left_to_right = true leading_divider = "true" } # pathname size = "" uncompressed_size = "" lines = "" files = "" } # log.fields Like the plug-in itself, this section is a node, using { } syntax for grouping. The node is the "fields" node within the "log" node of the plug-in, which describes the fields in the log data. This is a list of fields which we will extract from the log data: date, time, pathname, size, uncompressed_size, lines, and files. All fields are simple, default fields, except for "pathname," so all other fields are given an empty value, which simply defines the existence of the field, and lets the Create Profile Wizard decide what the field parameters are. But the pathname field is complicated in this case, because we're doing something fancy: we want this to be a hierarchically drillable field, so you can click "/logs/" in the Pathname report, and zoom in to see a report showing "/logs/12345/" and "/logs/12346/", and then click one of those to see another report with just the files in that subdirectory. This gives a "file browser" feel to the field, allowing you to zoom into directories by clicking them. But the default behavior of the Create Profile Wizard is to make reports list full field values, with no internal hierarchical structure, so we need to override those for this field. The options mean that this is a "page" field, a field (a pathname), with "/" between directories, with the containing items to the left (parent directories appear to the left of their children in a pathname), and with a leading divider (the pathname starts with a "/"). More information about these options is available in the Log Fields chapter of the Sawmill documentation. The Log Format Plug-in: The Database Fields (Non-Aggregating) The next section lists the non-aggregating (non-numerical) database fields:
# Database fields database.fields = { date_time = "" day_of_week = "" hour_of_day = "" pathname = { suppress_bottom = 99999 } } # database.fields This section describes the database fields, i.e., the fields as they will appear in Sawmill's database. There is generally one report per database field. Database fields roughly correspond to log fields, but there are often "derived" database fields, which are computed from specific log fields. The full list of derived fields is included in the Creating Log Format Plug-ins chapter of Sawmill online documentation. In this case, we are using the date and time log fields to derive three other fields to include in the database date_time, day_of_week, and hour_of_day. This allows us to see an integrated date/time report, as well as separate "Day of Week" and "Hour of Day" reports. The pathname field is also tracked, which will give us both a "Pathnames" report and a hierarchical "Pathnames/directories" report--the wizard automatically creates these two reports for a "page" field like "pathname" (where type="page" in the log field). Most fields are listed just as fieldname="", which lets the wizard pick the values of their parameters, but we do need to override one parameter: we set suppress_bottom to a large number in the pathname field, to allow any number of levels of hierarchy in the pathname field. Otherwise, zooming would only go two levels deep in the Pathnames/directories report (the default suppress_bottom value is 2). We didn't include the numerical fields here, because those don't become reports: they become columns in reports. They belong in the aggregating (numerical) fields, which is next.
The Log Format Plug-in: The Database Fields (Aggregating) The next section lists the aggregating (numerical) database fields:
database.numerical_fields = { files = { default = true } lines = { default = true } size = { type = "float" default = true display_format_type = "bandwidth" } # size uncompressed_size = { label = "uncompressed size" type = "float" default = true display_format_type = "bandwidth" } # uncompressed_size } # database.numerical_fields
Aggregating fields automatically combine their values, usually by summing them, into reports. So for instance, the "size" field automatically sums the number of bytes; if the log data contains two lines with 100 and 200 bytes listed, the Overview will show size=300. All four fields here are summing fields (the default), so the Overview will contain four entries: files, lines, size, and uncompressed size; and all reports will contain those four columns (with a non-aggregating field as the leftmost column). The value default=true is specified for all fields, which causes them to be checked by default, in the Create Profile Wizard. The two "size" fields are listed as type=float, which allows them to represent large numbers on 32-bit systems. The other two fields are left as default type integers, which is faster and smaller, because they are unlikely to exceed the maximum size of a 32-bit integer (about 2 billion). The two bandwidth fields specify display_format_type=bandwidth, which tells Sawmill to display them using bandwidth formatting. So where a value of 1024 would be displayed as "1024" if it is the number of lines, it will be displayed as "1 K" if it is the uncompressed size, or the size. The uncompressed_size field specifies label="uncompressed size". Without this, it would appear with its default label, which is the name, uncompressed_size. It looks better to have a space instead of an underbar, so we have overridden the label in this case. The Create Profile wizard also looks in the field_labels node of the file LogAnalysisInfo\language\english\lang_stats.cfg to try to find a matching label, and uses the one there if there is one (replace "english" with the name of your language if you're using a non-English translation), so you can also modify that file to change labels. The Log Format Plug-in: The Parsing Filters The next section lists the parsing filters:
log.parsing_filters.parse = ` if (matches_regular_expression(current_log_line(), '^pathname=([^|]+)[|]size=([0-9]+)[|] uncompressedsize=([0-9]+)[|]lines=([0-9]+)')) then ( # Add an entry which reports total usage by all files pathname = $1; size = $2; uncompressed_size = $3; lines = $4; files = 1; if (matches_regular_expression(pathname, '[.]([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][09])([0-9][0-9])([0-9][0-9])-')) then ( date = $1 . '-' . $2 . '-' . $3; time = $4 . ':' . $5 . ':00'; ); ); # if matches line ` The parsing filters contain a description of how Sawmill parses the line of log data. There are many several ways to parse the data which do not involve writing a parsing filter, including using delimited parsing (index/subindex), and using a single parsing regular expression. But parsing filters provide the most functionality, and in this case, they were useful because we wanted to reformat the date and time values from the log pathname (otherwise, we could have used a parsing regular expression). This parsing filter is an expression written in the Salang language (Sawmill's language; see The Configuration Language in the documentation for a reference). The entire expression is contained in backtick quotes (`) in this case, which are convenient because they allow you to use single (') or double (") quotes in the expression. This expression calls the current_log_line() function of Salang to get the value of the current line, then uses matches_regular_expression() to match that line with a regular expression which has parenthesized subexpressions to extract all the field values into the variables $1, $2, $3, etc. Then, it assigns those variables to pathname, size, etc. It sets the "files" field to 1, so that the sum of the "files" field will be the total number of fields. Finally, it uses matches_regular_expression() again to extract the date and time values from the pathname, and rebuilds them in YYYY-MM-DD and HH:MM:SS format, putting them into the date and time fields for Sawmill to use.
Writing a parsing filter is similar to writing a script or computer program, and requires some experience with scripting. This is usually the most difficult part of the plug-in. Fortunately, many log formats do not require this step; they can be parsed using index/subindex or a "parsing regular expression." See Creating Log Format Plug-ins in the Sawmill documentation for examples of these simpler approaches to parsing. Parsing filters are required if a single log entry spans multiple lines, or if field values need to be converted before being put into the log fields (as in this case), or in some other cases where advanced calculations are required. The Log Format Plug-in: The Wizard Options The next section lists the Create Profile Wizard options:
create_profile_wizard_options = { # The reports menu report_groups = { date_time_group = "" } # report_groups } # create_profile_wizard_options There are many options which can be included here (see Creating Log Format Plug-ins), which specify report groupings, report details, field associations by report, final_step cleanup/reworking, and more. But in this example, we're sticking with the basics: all we want are a date/time group in the reports menu, and all default reports (Overview, one report per database field, Single-page Summary, and Log Detail). This is done by having a date_time_group specified in the report_groups section of the create_profile_wizard_options section.
The Log Format Plug-in: The Closing Bracket The plug-in is a single CFG node, which starts with "compute_size_data = ". Therefore, it must have a closing bracket at the end:
} # compute_size_data The comment is optional but it is useful to put a the node name as a comment on every closing bracket to improve legibility. Conclusion This newsletter describes the process of creating a plug-in for parsing and reporting on a custom log format. This type of plugin can be created to parse and report on any textual log data. The process of creating a plug-in is somewhat complex and detailed, especially if it involves parsing filters. Our experts have created hundreds of plug-ins, and can do it quickly and accurately. If you would like assistance in creating a plug-in for a custom log format, you can also use Sawmill Professional Services. Contact sales@sawmill.net for more information.
[Article revision v1.0] [ClientID: 43726]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
October 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 7.2.15 shipped on May 16, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users. It is recommended for anyone who is experiencing problems with Sawmill 7.2.14 or earlier. You can download it from http:// sawmill.net/download.html . Sawmill 8 is near the end of "alpha" stage. That means that we have completed development of all features, have fixed most major bugs, and Sawmill 8 will ship (8.0.0) when all remaining known bugs are fixed. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. A "beta" version will be publicly available when all major bugs are fixed, probably in the next few weeks. Watch sawmill.net for the beta announcement! This issue of the Sawmill Newsletter describes using Sawmill to convert complex log data to simple delimited text, using the process_logs command-line action. Get The Most Out Of Sawmill With Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Converting Log Data With process_logs Typically Sawmill is used to parse log data into a database, and then generate reports from that database with Sawmill's web reporting interface. But Sawmill can also be used as a stand-alone parser, or log converter. This is useful for parsing complicated log formats, like mail logs, which typically have many lines per event, and where the lines must be parsed intelligently, in a particular order, and then re-integrated into a single event. This sort of parsing is difficult, but Sawmill already knows how to do it for all common log formats. So if you're writing a script to import log data into a database, or to do alerting based on log data, or to doing anything else with log data, you can use Sawmill as the first stage of your script, to do the parsing and send the parsed data on to you.
This type of conversion is typically done from a script, or from the command line. For instance, you might run the following command on a profile created from an Apache log file:
$ sawmill -p my_profile -a pl -v 0 > out.csv The -p option specifies the internal profile name; the "-a pl" is short for "-a process_logs", which specifies the action; "-v 0" tells Sawmill to generate no output other than the process_logs output. If the input (Apache) log data is this:
140.177.203.25 - - [07/Apr/1998:16:53:06 -0500] "GET / HTTP/1.0" 200 734 "-" "Mozilla/4.04 [en] (X11; I; SunOS 5.6 sun4u)" 140.177.203.25 - - [07/Apr/1998:16:53:06 -0500] "GET /sawmill/picts/TopBanner.gif HTTP/1.0" 200 4573 "http://asooo/" "Mozilla/4.04 [en] (X11; I; SunOS 5.6 sun4u)" 192.17.19.148 - - [07/Apr/1998:21:32:04 -0500] "GET / HTTP/1.0" 200 734 "http://developer. javasoft.com/developer/earlyAccess/java3d-features.html" "Mozilla/4.04 (Macintosh; I; PPC, Nav)" 192.17.19.148 - - [07/Apr/1998:21:32:12 -0500] "GET /sawmill/picts/TopBanner.gif HTTP/1.0" 200 4573 "http://asooo.wolfram.com/" "Mozilla/4.04 (Macintosh; I; PPC, Nav)" 192.17.19.148 - - [07/Apr/1998:21:32:15 -0500] "GET /flowerfire/ HTTP/1.0" 404 154 "http:// asooo.wolfram.com/" "Mozilla/4.04 (Macintosh; I; PPC, Nav)" 192.17.19.148 - - [07/Apr/1998:21:32:18 -0500] "GET / HTTP/1.0" 200 734 "http://asooo. wolfram.com/flowerfire/" "Mozilla/4.04 (Macintosh; I; PPC, Nav)" 192.17.19.148 - - [07/Apr/1998:21:32:21 -0500] "GET /sawmill/ HTTP/1.0" 200 3344 "http:// asooo.wolfram.com/" "Mozilla/4.04 (Macintosh; I; PPC, Nav)" 192.17.19.148 - - [07/Apr/1998:21:32:26 -0500] "GET /sawmill/picts/title.gif HTTP/1.0" 200 48771 "http://asooo.wolfram.com/sawmill/" "Mozilla/4.04 (Macintosh; I; PPC, Nav)" 192.17.19.148 - - [07/Apr/1998:21:32:29 -0500] "GET /sawmill/samples.html HTTP/1.0" 200 8996 "http://asooo.wolfram.com/sawmill/" "Mozilla/4.04 (Macintosh; I; PPC, Nav)" 192.17.19.148 - - [07/Apr/1998:21:32:30 -0500] "GET /sawmill/picts/header.gif HTTP/1.0" 200 12371 "http://asooo.wolfram.com/sawmill/samples.html" "Mozilla/4.04 (Macintosh; I; PPC, Nav)" then the result in out.csv will be this, a traditional CSV file with a comma-separated header listing the names of the fields, and one line per event with comma-separated values: date_time,day_of_week,hour_of_day,hit_type,page,file_type,worm,screen_dimensions, screen_depth,hostname,domain_description,location,organization,isp,domain,referrer, referrer_description,search_engine,search_phrase,web_browser,operating_system,spider, server_domain,authenticated_user,server_response,hits,page_views,spiders,worms,errors, broken_links,screen_info_hits,visitors,size 07/Apr/1998 16:53:06,3,16,page view,/{default},(no type),(not a worm),(-) x (-), (-),140.177.203.25,IP Address,United States/IL/Champaign,Wolfram Research,(unknown ISP), (unknown domain),(no referrer),(no referrer),(no search engine),(no search phrase),Netscape Navigator/4.04 ,SunOS,(not a spider),-,(not authenticated),200,1,1,(empty),(empty),(empty), (empty),(empty),140.177.203.25,734 07/Apr/1998 16:53:06,3,16,hit,/sawmill/picts/(nonpage),GIF,(not a worm),(-) x (-), (-),140.177.203.25,IP Address,United States/IL/Champaign,Wolfram Research,(unknown ISP), (unknown domain),http://asooo/(omitted),Unknown,(no search engine),(no search phrase), Netscape Navigator/4.04 ,SunOS,(not a spider),-,(not authenticated),200,1,(empty),(empty), (empty),(empty),(empty),(empty),140.177.203.25,4573 07/Apr/1998 21:32:04,3,21,page view,/{default},(no type),(not a worm),(-) x (-), (-),192.17.19.148,IP Address,United States/IL/Urbana,University of Illinois,(unknown ISP), (unknown domain),http://developer.javasoft.com/(omitted),Commercial (com),(no search engine),(no search phrase),Netscape Navigator/4.04 ,Macintosh,(not a spider),-,(not authenticated),200,1,1,(empty),(empty),(empty),(empty),(empty),192.17.19.148,734 07/Apr/1998 21:32:12,3,21,hit,/sawmill/picts/(nonpage),GIF,(not a worm),(-) x (-), (-),192.17.19.148,IP Address,United States/IL/Urbana,University of Illinois,(unknown ISP), (unknown domain),http://asooo.wolfram.com/(omitted),Commercial (com),(no search engine),(no
search phrase),Netscape Navigator/4.04 ,Macintosh,(not a spider),-,(not authenticated),200,1,(empty),(empty),(empty),(empty),(empty),(empty),192.17.19.148,4573 07/Apr/1998 21:32:15,3,21,broken link,/flowerfire/(nonpage),(no type),(not a worm),(-) x (-),(-),192.17.19.148,IP Address,United States/IL/Urbana,University of Illinois,(unknown ISP),(unknown domain),http://asooo.wolfram.com/(omitted),Commercial (com),(no search engine),(no search phrase),Netscape Navigator/4.04 ,Macintosh,(not a spider),-,(not authenticated),404,1,(empty),(empty),(empty),1,1,(empty),192.17.19.148,154 07/Apr/1998 21:32:18,3,21,page view,/{default},(no type),(not a worm),(-) x (-), (-),192.17.19.148,IP Address,United States/IL/Urbana,University of Illinois,(unknown ISP), (unknown domain),http://asooo.wolfram.com/(omitted),Commercial (com),(no search engine),(no search phrase),Netscape Navigator/4.04 ,Macintosh,(not a spider),-,(not authenticated),200,1,1,(empty),(empty),(empty),(empty),(empty),192.17.19.148,734 07/Apr/1998 21:32:21,3,21,page view,/sawmill/{default},(no type),(not a worm),(-) x (-), (-),192.17.19.148,IP Address,United States/IL/Urbana,University of Illinois,(unknown ISP), (unknown domain),http://asooo.wolfram.com/(omitted),Commercial (com),(no search engine),(no search phrase),Netscape Navigator/4.04 ,Macintosh,(not a spider),-,(not authenticated),200,1,1,(empty),(empty),(empty),(empty),(empty),192.17.19.148,3344 07/Apr/1998 21:32:26,3,21,hit,/sawmill/picts/(nonpage),GIF,(not a worm),(-) x (-), (-),192.17.19.148,IP Address,United States/IL/Urbana,University of Illinois,(unknown ISP), (unknown domain),http://asooo.wolfram.com/(omitted),Commercial (com),(no search engine),(no search phrase),Netscape Navigator/4.04 ,Macintosh,(not a spider),-,(not authenticated),200,1,(empty),(empty),(empty),(empty),(empty),(empty),192.17.19.148,48771 07/Apr/1998 21:32:29,3,21,page view,/sawmill/samples.html,HTML,(not a worm),(-) x (-), (-),192.17.19.148,IP Address,United States/IL/Urbana,University of Illinois,(unknown ISP), (unknown domain),http://asooo.wolfram.com/(omitted),Commercial (com),(no search engine),(no search phrase),Netscape Navigator/4.04 ,Macintosh,(not a spider),-,(not authenticated),200,1,1,(empty),(empty),(empty),(empty),(empty),192.17.19.148,8996 This can be used as input to a script. It's easier to parse than the original log data, and also has some extra information, like geographic locations; and Sawmill's own log filtering functionality can be used to include additional columns (database fields), populated programatically. For instance, it could use BCP or another program to import the comma-separated data into a SQL database for later querying. Advance Topic: Customizing The Output of process_logs The default output of process_logs is good for many purposes, but if you have a script which expects a particular format of input, you may need to use some of Sawmill's additional options to customize the output. Specifically,
q
The "field delimiter" option (-fd) specifies the delimiter to use between fields, if a comma is not desired. For instance, use '-fd "|"' to create a pipe-delimited file, or '-df "\t"' to create a tab-delimited file. The "suppress output header" option (-soh) can be used to omit the first line, which lists the database fields ('-soh true'). The "output date/time format" option (-odtf) can be used to change the format of the output date/time, using a strftime format string. For instance, use '-odtf "%Y-%m-%d %H:%M:%S"' to generate the timestamp in a format like "2000-0102 03:04:05", suitable for import into many SQL databases. The "empty output value" option (-eov) specifies the value to use when a field is empty. The default is "(empty)", but if you want the value to be completely omitted, you can use '-eov ""'.
Note: all four options above are (will be) available in Sawmill 7.2.16 and later. If you need them immediately, contact support for a pre-release download. Conclusion This newsletter describes the process_logs command-line action, which has general utility in creating environments where Sawmill acts as a log parser or converter, and in later stages in the process operate on the logs that Sawmill has parsed, converted, and simplified. If you would like assistance in setting up this sort of environment, you can also use Sawmill Professional Services. Contact sales@sawmill.net for more information.
[Article revision v1.0] [ClientID: 43726]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
December 15, 2008
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News BIG NEWS this month: Sawmill 8 is now available for download and purchase. This is a major upgrade to Sawmill. It is a free upgrade for those with Premium Support; other Sawmill customers get 50% off the license price. This issue of the Sawmill newsletter is dedicated entirely to Sawmill 8: what's new, and why you should upgrade from Sawmill 7 (or earlier).
Sawmill 8 adds support for Microsoft SQL Server and Oracle, to the MySQL and internal databases supported by Sawmill 7. Like Sawmill 7 with MySQL, this back-end database use is transparent once it is configured--a profile which uses Microsoft SQL Server or Oracle works just the same as a profile which uses MySQL or the internal database, with all the same configuration and reporting capabilities. The only change is the location of the database storage, and the database engine used to run queries.
Use of an enterprise database engine with Sawmill can provide better performance through clustering, as well as improved reliability, redundancy, backup and recovery, and database management. Role-Based Authentication Control (RBAC)
Sawmill 8 adds Role-Based Authentication Control, for highly granular user permission management. In Sawmill 7, there were only Administrators, who could do anything, and non-Administrators, who could only view the reports of specified profiles (and could not configure profiles). Sawmill 8 extends this by allowing the creation of any number of roles, each with specific permissions, and any number of users in one or more role. For instance, it is possible to create a role that can manage a particular profile, but not other profiles; and that can edit the log filters of that profiles, but not delete them; etc. The permissions of each user can be controlled at a very detailed level. Real-Time Importing/Reporting Sawmill 7 always operated in a batch mode: log files were imported periodically, often nightly; reporting was not available during import; and reports were then generated from that snapshot of the data, until the time of the next import. Sawmill 8 supports true real-time importing and reporting. A profile can be configured to read a continuous stream of log data, adding each line to its database as it appears on the stream. Reports can be generated at any time, and show the latest data in the database as of the moment of report generation. This allows a profile to show up-to-the-second reports at any time. Improved Memory Management Sawmill 7 could run out of memory on 32-bit systems, due to a number of memory management approaches which could cause high memory (or address space) usage for large datasets. Sawmill 8 improves memory management to keep memory usage low for even very large datasets, allowing enormous datasets to be processed on systems with limited memory, and on 32-bit systems. Built-In SQL Database Sawmill 8s internal database supports a subset of SQL, allowing information to be queried directly from the internal database using SQL statements (Sawmill 7s internal database does not support SQL queries). Sawmill 8 also uses a unified set of SQL queries internally to perform database operations on all supported databases (the internal SQL database, MySQL, Oracle, and Microsoft SQL Server), including the internal one, for better reliability and performance. Enhanced Reporting User Interface
Sawmill 8 has a completely redesigned reporting web interface. The interface will be familiar and an easy transition for Sawmill 7 users, but has many improvements to simplify and enhance report viewing. Improvements over Sawmill 7 include:
q q
q q q q
q q q q
Frames have been replaced with multi-column pages, for better integration with larger environments. Zooming is easier, due to a new approach that involves clicking an item, and immediately clicking a report name in the left menu to zoom to it (instead of using a zoom to report menu like Sawmill 7). Zoom supports clicking on multiple items to zoom simultaneously on several rows. Filters can be grouped into Filter Sets, and named and managed easily. A new Macros feature saves the current report and/or filter set, so it can be easily called up again later. Report customization from the Reports page is greatly enhanced, providing most of the functionality of Sawmill 7s Report Editor without leaving the Reports page. Graphing improvements include 3D pie charts and smoother (anti-aliased) graphs. The current report can be emailed immediately. It is now possible to filter on numerical fields with Report Filters, for instance to select all events with bytes > 1024. Session fields are integrated with other fields, so any report can include a sessions column, a session duration column, or other session-related columns; and session reports like Session Pages can contain non-session columns like referrer. New report table filters can filter the table after it is generated, allowing dynamic suppression of table rows; e.g., show only rows with more than 15 page views. Pivot tables can now be created dynamically from the Reports interface; for instance, an indented audio codec column can be added to a video codec report without leaving Reports. An improved date filter picker allows relative date selections like last 3 months or yesterday.
Sawmill 8 has a completely redesigned Admin and Config user interface. The interface will be familiar and an easy transition for Sawmill 7 users, but has many improvements to simplify administration and configuration. In addition, there are several major new components that were not present in Sawmill 7:
q q q q q q
The Log Fields Editor The Database Fields Editor The Session Fields Editor The Report Fields Editor The Cross-Reference Groups Editor The New Field Wizard. This new wizard makes it much easier to create custom fields; it creates a new log field, an associated database field to track it, a report field for use in reports, a log filter to set its value, a cross-reference table to cache it, and a new report to display the results, all in a single step.
Sawmill 8 allows you to edit all aspects of the profile from the web interface. This is a marked improvement over Sawmill 7, where many of the advanced customizations required direct editing of the CFG files. For instance, adding a custom report in Sawmill 7 required at least five separate edits of the profile CFG, using a text editor; in Sawmill 8, it can be done in a single step with the New Field Wizard, or the five steps can be done separately with the new web editors, without any text editing. Improved Scheduler
The Sawmill 8 Scheduler has a number of improvements over Sawmill 7. Most notably, there is a Run Now button to immediate run any scheduled task, and scheduled tasks can list multiple actions that run in sequence. For instance, a scheduled task can update the database, then remove data older than 30 days, and then email a report. In Sawmill 7, that would have required three separate scheduled tasks, and they would have had to be carefully spaced so they didnt collide. Simplified Date Filter Syntax Sawmill 8s data filter syntax is vastly expanded over Sawmill 7. In Sawmill 7, date filters could be 15/Jan/2008-15/Feb/2008, or similarly for months and years; in Sawmill 8, a wide range of intelligently interpreted options are also available. Some examples:
q q q q q
last 10 months 3 months ago-2 months ago q1/2008 q3/2008-Feb 2009 yesterday
and much more. These can be used in the Scheduler, or on the command line; similar options are available in the Reports through the Date Picker. Database Import/Export Sawmill 8 can export an entire database to a text format, and import a database from that text format (Sawmill 7 provided no database import or export capabilities). This makes it easy to move a database between platforms, or from one database server to another, without having to rebuild it from the log data.
Improved Report Caching And Performance Sawmill 8 adds several additional levels of report caching, beyond the HTML report cache used by Sawmill 7. In Sawmill 7, any change to a report would require it to be regenerated; in Sawmill 8, many changes can be regenerated from the cache, including paging through the report, changing the sort, or changing the visible columns. This makes some report operations much faster in Sawmill 8. Sawmill 8 also takes advantage of multiple processors or cores, when available, to improve the performance of reports. In Sawmill 7, report tasks always used only one processor. This can result in much faster report generation for very large datasets. Additional Log Source Options
Sawmill 8 adds support for SFTP and SQL log sources, to the list supported by Sawmill 7 (local file, FTP, command, HTTP). SFTP provides a more secure, and more reliable, method for downloading log data from a remote system. SQL log sources can read log data directly from a SQL database. Sawmill 8 can also recursively process a hierarchy of folders on an SFTP of FTP server using a single log source; Sawmill 7 could only download the contents of a single FTP folder, not its subfolders. Improved Log Import Performance Sawmill 8 automatically splits log processing across multiple processors, to improve import performance. Sawmill 7 also supports splitting database builds across processors, but Sawmill 8 detects the number of processors automatically to do the split, and Sawmill 8 also uses a efficient channel of communication between processes to reduce disk contention between threads, for better performance and scalability of the initial data import step. Sawmill 8 can also be configured to split log processing across multiple servers in a cluster, for even higher performance (Sawmill 7 supports only multiprocessor on a single server). Sawmill 8 can be configured to build cross-reference tables, indices, hierarchy tables, and session tables, on-demand, deferring them until they are needed. When it is configured in this way, the initial database import step is the only step of a database build or import, so reports are available as soon as the import is done (or sooner, if its using real-time; see above). That makes database updates much faster, because they do not need to build all the support tables; those tables will be built later, when a report requests them, if at all. This provides an opportunity for configuring a report-speed vs. import-speed tradeoff that does not exist in Sawmill 7, where all support tables are always built automatically during database build and update. Sawmill 8 uses improved SQL queries for building cross-reference tables. Cross-reference table generation, which is the largest part of database build time, are much faster in Sawmill 8 than they were in Sawmill 7, especially during database updates. Sawmill 7 rebuilt all cross-reference tables from scratch after each update; Sawmill 8 performs an incremental update of the cross-reference tables, which is much faster. Other Enhancements Sawmill 8 includes many other enhancements over Sawmill 7. Some of these are:
q q q
Report fields for more flexibility and fine-tuning of report elements and table data. Direct URL access to reports; URLs can include profile name, report name, date filter, filter expression, filter comment. One-level slice of a hierarchical field in a report column; for instance, a months report (across all years).
q q q q q q q q
New Calendar report. Independent sort field and sort direction of the drill down field. Option to include row numbers and aggregation rows when exporting a CSV table within the reports interface. PDF report generation from the Scheduler. Minimum and maximum aggregation rows to tables. Default date filter per profile. Table row selection (to mark a row in yellow). User-created actions (-a options), with fully customizable parameters and behavior.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
January 15, 2009
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 8.0.1 shipped on December 24, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 8 users. It is recommended for anyone who is experiencing problems with Sawmill 8.0.0. You can download it from http://sawmill.net/ download.html . Sawmill 7 users can upgrade to Sawmill 8 for half of the license price; or if you have Premium Support, the upgrade is free. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. This issue of the Sawmill Newsletter describes the process for migrating from Sawmill 7 profiles, databases, users, and schedules to Sawmill 8. Get The Most Out Of Sawmill With Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Migrating Sawmill 7 Data to Sawmill 8 If you've upgraded from Sawmill 7 to Sawmill 8, you probably have existing profiles, databases, users, and schedules in your Sawmill 7 installation, which you would like to use with Sawmill 8. Sawmill 8 provides an Import Wizard for easy transfer of these files and data. To start, first install Sawmill 8, but do not uninstall Sawmill 7. Or, if you want to uninstall Sawmill 7, make sure you back up your LogAnalysisInfo folder to a drive where Sawmill 8 can reach it. Once Sawmill 8 is installed and running, the Admin page will show no profiles:
Sawmill 8 Fresh Install Now, click the Import link at the right of the Admin menu, and then click "Import Sawmill 7 data" to bring up the Import Wizard:
Import Wizard: Browse To LogAnalysisInfo Click the Browse button and browse to the location of the LogAnalysisInfo folder in your Sawmill 7 installation (or enter the pathname of the LogAnalysisInfo folder in the field), and click the Next button. Sawmill will examine the Sawmill 7 LogAnalysisInfo, finding your profiles, schedules, and users, and display a page like this, where you can check which ones you want to import:
Import Wizard: Select Data To Import After selecting the profiles you want to import into Sawmill 8, and checking Schedules and/or Users if you want to import the Sawmill 7 schedules or users, click Next. The Import Wizard will show a page like the following, asking which databases should be converted:
Import Wizard: Select Databases To Import After selecting the database you want to import, click Next. If any of them are MySQL databases, you will be prompted at this point to choose the name of the MySQL databases for Sawmill 8 (it can't be the same as the Sawmill 7 profile, or the converted Sawmill 8 database will overwrite the Sawmill 7 database, and vice versa); if this appears, choose the names at this point. The next page gives a list of data that will be imported:
Import Wizard: Ready To Import After reviewing the list, click Finish to confirm and begin the import. Sawmill will convert your profiles to Sawmill 8 format, and install them in the Sawmill 8 profiles list. It will convert the databases for those profiles to Sawmill 8 database format (if you asked it to), and install them in the Sawmill 8 installation. It will convert the schedules and users to Sawmill 8 format (if requested), and install them as Sawmill 8 schedules and users. As it imports, it will show its progress by adding Complete next to each item. If there are errors, it will report them; otherwise, you will get a page like this:
Import Wizard: Import Completed The import was successful, so now click Close to close the Import Wizard, and then click Profiles in the Admin menu to return to the profiles list:
Profiles List After Successful Import The list now contains the profiles that have been converted from Sawmill 7. If you look at Scheduler or Users, you will see your imported schedules and users; if you view reports, you will see the data from the imported Sawmill 7 profiles. Advanced Topic: Converting A Database to SQL Server Or Oracle The Import Wizard converts Sawmill 7 databases using the "internal" database engine to Sawmill 8 databases using the "internal" database engine, and it converts databases using MySQL to Sawmill 8 databases using MySQL. If you want to change the database to use Microsoft SQL Server or Oracle, you need to then convert the database to a new format by exporting it to text, changing the database server, and importing it. This is done from the command line.
To export a database to text, use this command: sawmill -p profile -a ed -d v8DatabaseExport This exports the database for the profile profile to a text format, and puts the text files in a directory called v8DatabaseExport (or use a different pathname for the export directory). Now, go to the Database -> Server section of the profile Config. There, change the database server options to point to the database server you want to use (e.g., select MS SQL and enter the DSN information). Save the changes, and run this command: sawmill -p profile -a id -d v8DatabaseExport This will import the database from the text files in the folder v8DatabaseExport. This will put the data into the profile's database, which is now in the new location you specified. That's all--the profile now uses a different database engine, and the original data from the database is in the new location. This method can be used with any v8 profile (not just those converted from Sawmill 7), to convert its database from one database server to another.
[Article revision v1.0] [ClientID: 43726]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
Feburary 15, 2009
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 8.0.3 shipped on January 23, 2009. This is a minor "bug fix" release, and it is free to existing Sawmill 8 users. It is recommended for anyone who is experiencing problems with Sawmill 8.0.0. You can download it from http://sawmill.net/ download.html . Sawmill 7 users can upgrade to Sawmill 8 for half of the license price; or if you currently have Premium Support, the upgrade is free. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. This issue of the Sawmill Newsletter describes a method for using Sawmill's log scanning and alerting features to detect and alert on intrusion attempts. Get The Most Out Of Sawmill With Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Detecting And Alerting On Intrusion Attempts With Sawmill Sawmill's log filters make it possible to write very flexible rules for detecting certain conditions in the log data. This can be used for intrusion detection and alerting. Consider the following FTP log data from Microsoft Internet Information Services (IIS):
#Software: Microsoft Internet Information Services 6.0 #Version: 1.0 #Date: 2009-02-05 04:59:59 #Fields: time c-ip cs-method cs-uri-stem sc-status sc-win32-status 04:59:59 12.34.56.78 [4007]USER Administrator 331 0 04:59:59 12.34.56.78 [4007]PASS - 530 1326 04:59:59 12.34.56.78 [4007]USER Administrator 331 0 04:59:59 12.34.56.78 [4007]PASS - 530 1326 05:00:01 12.34.56.78 [4007]USER Administrator 331 0 05:00:01 12.34.56.78 [4007]PASS - 530 1326 05:00:01 12.34.56.78 [4007]USER Administrator 331 0 05:00:02 12.34.56.78 [4007]PASS - 530 1326 05:00:02 12.34.56.78 [4007]USER Administrator 331 0 05:00:02 12.34.56.78 [4007]PASS - 530 1326 05:00:03 12.34.56.78 [4007]USER Administrator 331 0 05:00:03 12.34.56.78 [4007]PASS - 530 1326 05:00:03 12.34.56.78 [4007]USER Administrator 331 0 05:00:03 12.34.56.78 [4007]PASS - 530 1326 This log data shows a password-cracking attack originating from IP address 12.34.56.78. We can guess this is an attack, rather than a series of legitimate login attempts, because the logins are happening so fast--there are several attempts each second, from the same IP address, to log in as an Administrator. Sawmill can show this data in the Log Detail of its standard reporting, of course, and that can be useful for examining past intrusion attempts. But if you want to know about the intrusion attempt as it occurs, or shortly thereafter, you need more--you need alerting. To create an alert in Sawmill, first define for yourself what condition should trigger the alert. In this case, it is:
q q q
The current line of log data is a PASS attempt. The previous login attempt for this user is in the same second as this one. We have not fired an alert for this IP previously.
The last condition is intended to prevent Sawmill from sending a million emails to you, as one IP attempts to crack passwords over a million lines--we only want one email in this case. Now that we've defined the condition, we need to implement it as a log filter (written in Salang, Sawmill's built-in language, which is used for advanced filtering). Below is a Salang log filter which implements this condition, and alerts on it. This can be copied and pasted directly into an "advanced expression" log filter created in your profile, in Config -> Log Filters:
# Only consider PASS lines as intrusions if (cs_method eq "PASS") then ( # Make sure the nodes we're going to use have been initialized v.password_attempt_times = ""; v.intrusion_reported_for_ip = ""; # Get the timestamp of the previous password attempt for the current user int last_password_attempt_for_this_user = @'v.password_attempt_times'{username}; # If the current timestamp matches the timestamp of the previous attempt, then this is an intrusion attempt if (date_time == last_password_attempt_for_this_user) then ( # If we've already reported this IP, don't do it again. if (!'v.intrusion_reported_for_ip'?{c_ip}) then ( # send email to admin@yourplace.com, from admin@yourplace.com, with a simple description in the subject, # and a longer description in the body.
send_email("admin@yourplace.com", "admin@yourplace.com", "Subject: Password scan attempt on " . username . " from " . c_ip . "\r \n" . "To: admin@yourplace.com\r\n" . "\r\n" . "Sawmill has detected a password scan attempt on user " . username . " from IP address " . c_ip . ". There were multiple attempts to log in as " . username . " at " . date_time . ".", "smtp.yourplace.com"); # Remember that we have reported this IP @'v.intrusion_reported_for_ip'{c_ip} = true; ); # if intrusion not yet reported ); # if timestamp is the same # Remember the timestamp of this password attempt, for this username @'v.password_attempt_times'{username} = date_time; ); # if PASS NOTE: This script uses new syntax available only in Sawmill 8. If you're using Sawmill 7, you will need to use equivalent syntax, e.g., node_exists instead of "?", and subnode_by_name() instead of "@{}". The lines beginning with # are comments, and describes the operation of the log filter in detail. Some comments:
q
The log filter uses the node v.password_attempt_times to remember the timestamps of previous password attempts on this user. This is a string-to-string map which maps usernames to the date_time values of the previous login attempt. The log filter uses the node v.intrusion_reported_for_ip to keep track of which IPs have already been reported. IP addresses are in this map only after they have been reported. The log filter uses send_email() to send email to admin@yourplace.com when an intrusion needs to be reported. The file parameter must be a SMTP server which accepts unauthenticated connections for the recipient (e.g., their MX server).
Advanced Topic: Real-Time Alerting From Streaming Log Data If the log filter we've created above, is in a profile when you build a database, it will trigger all alerts for the dataset during the build. This is fine for getting after-the-fact information about the intrusion, but if you want to be alerted as intrusions occur, you need to stream the log data into Sawmill as it is generated. This is best done with a command line log source, which monitors the log files and dumps new data to its standard output stream as it appears in the log files (UNIX "tail -f" is a simple example of this). This requires an external script to do the monitoring; once you have created such a script, you can do real-time alerting by doing a Real-Time database build (using the Real-Time feature of Sawmill 8 Enterprise), or using streaming alerting with the "-a pl" option (see the May 2007 Newsletter). Professional Services The techniques described in this newsletter involve advanced Salang programming and scripting. This is something you can implement yourself, if you have programming or scripting experience, or if you are willing to spend time learning Salang. If you would like this sort of alerting implemented in your Sawmill installation, but are not familiar with programming, or Salang, or don't have time to do it yourself, we can help. Sawmill's Professional Services experts can implement a filter like this one for you, using your own rules and your own data, and in your own profile, very quickly. Contact support@sawmill.net for a quote.
[Article revision v1.0] [ClientID: 43726]
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
March 15, 2009
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 8.0.5 shipped on February 26, 2009. This is a minor "bug fix" release, and it is free to existing Sawmill 8 users. It is recommended for anyone who is experiencing problems with Sawmill 8.0.4 or earlier. You can download it from http:// sawmill.net/download.html . Sawmill 7 users can upgrade to Sawmill 8 for half of the license price; or if you have Premium Support, the upgrade is free. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. This issue of the Sawmill Newsletter describes the new date range filtering options in Sawmill 8, including relative date ranges. Get The Most Out Of Sawmill With Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Date Range Filtering When viewing or generating reports, it is often useful to see reports for a particular day, month, or year; or a particular range of days. This can be done with the Date Picker, the Report Date option of the Scheduler, or Date Filter Expressions. The Date Picker Sawmill has a dedicated window in the Reports section of the web interface, just for date filtering:
The Date Picker In addition to filtering on a particular day, week, month, quarter, or year (using the "Date or Start Date" tab), or on a specific range of days (using the "Date or Start Date" and "End Date" tabs together to select the starting and ending day of the range), it is also possible to filter on a relative range of days. For instance, you can filter on the "recent 5 days" to select the most recent 5 days relative to the date the report is generated, including the current day; or you can filter on the "last 2 weeks" to select the most recent 5 days relative to the date the report is generated, excluding the current day. When a date or range is selected in the Date Picker, the generated report will be created from only those events which occurred during a specified period. Date Filtering In The Scheduler: The Report Date Option The Date Picker is available only for ad hoc web-based reporting, but when sending reports from the Scheduler, you can use the Report Date option to some types of common date filters:
The Report Date Option (Scheduler) The default includes the entire range of dates. The second and third options specify a specific number of units (e.g., 5 days, or 2 weeks), counting backward from the date of the report, or from the end of the log data. The last option lets you enter a date filter expression, described below. When a Report Date option is specified, the generated report will be created from only those event which occurred during the specified period. Advanced Topic: Date Filter Expressions The Date Picker is available when using the Reports section of the profile to generate ad hoc reports through the web interface, and the Report Date option is available when generating reports from the Scheduler. In other situations, date filtering is done by typing a date filter expression. These include:
q q q q
Reports generated from the Scheduler using the Custom Date Filter option of the Report Date option Custom reports or report elements with permanent Date Filters Reports generated from the command line Permanent per-profile date filters
Date Filter Expressions are strings of characters. The simplest options select single date units, e.g.: 01/Jan/2008 Jan/2008 Q2/2008 2008 All data from January 1, 2008 All data from January, 2008 All data from the second quarter of 2008 (April 1, 2008 through June 30, 2008) All data from 2008
Any two simple date expressions can be combined with a hypen to form a range, which selects data from the beginning of the first expression, to the end of the second expression: 01/Jan/2008-15/Jan/2008 Jan/2008-May/2008 Q2/2008-Q3/2008 2008-2009 All data from January 1, 2008 through January 15, 2008 All data from January, 2008 through May, 2008 All data from the second quarter of 2008, through the third quarter of 2008 (April 1, 2008 through September 30, 2008) All data from 2008 through 2009 (two full years of data)
Date range expressions can also be relative to the time of report generation: last 30 days recent 30 days recent 2 quarters yesterday today 2 weeks ago Other options (useful only in ranges), include: end start start of X end of X unit N units before X The last second of log data The first second of log data The first second of the range X, which can be any non-range date expression (e.g., "start of yesterday") The last second of the range X, which can be any non-range date expression (e.g., "end of Q2/2008") The time unit (e.g., month) N units before the range X (e.g., "month 4 months before 2008") All log data from the last 30 days. Last excludes the current calendar day. All log data from the most recent 30 days of the log data, relative to the moment of report generation. Recent includes the current calendar day. All log data from the most recent 2 full quarters, including the quarter when the report is generated (the current quarter). All log data from yesterday All log data from today All log data from the week two weeks before report generation
Advanced results can be achieved by combining the items above into ranges, keeping in mind that a range goes from the first second of the first item, until the last second of the last item, e.g.: 2 weeks ago-yesterday start of 2008-1 week ago 4 weeks ago-3 weeks ago Q1/2006-end of 2008 The last two weeks, except today January 1, 2008 through the day one week ago All data collected between 3 and 4 weeks ago January 1, 2006 through December 31, 2008
week 2 weeks before end-week 0 weeks before end The last two weeks of data in the database (relative to the end of the data in the database).
Advanced Topic: Date Filter Expressions Plug-ins The date filter syntax is quite flexible, but if you need a date filter which is not supported, it is possible to add support for it by writing a date filter expression plug-in. The details of date filter expression plug-in authoring are beyond the scope of this document, but in brief, it is a CFG file in the date_filters folder of LogAnalysisInfo, which describes (1) what the expression looks like, (2) what parameters it has, and (3) how to compute the beginning and end of the range it represents. All expressions above are implemented in this way, so the examples in that folder can serve as a good starting point for writing your own. This could be used, for instance, to implement date filters based on fiscal quarters ("FQ1/2008-FQ2/2008"), holidays ("Labor Day, 2009"), or anything else ("third tuesday of Jan, 2009-end of 2009"; "start of 2008-my birthday, 2008"). Professional Services This newsletter describes date range filters. The most common uses are straightforward, but if you need to do something more advanced, like creating a complex date filter expression, or even a date filter expression plug-in, our Sawmill Experts can help. Contact sales@sawmill.net for more information.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
April 15, 2009
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 8.0.6 shipped on March 19, 2009. This is a minor "bug fix" release, and it is free to existing Sawmill 8 users. It is recommended for anyone who is experiencing problems with Sawmill 8.0.5 or earlier. You can download it from http://sawmill.net/download.html . Sawmill 7 users can upgrade to Sawmill 8 for half of the license price; or if you have Premium Support, the upgrade is free. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. This issue of the Sawmill Newsletter describes the new date range filtering options in Sawmill 8, including relative date ranges. Get The Most Out Of Sawmill With Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire. com.
Tips & Techniques: Creating Custom Fields, Revisited In the February 2007 Newsletter, we discussed creating custom database fields. The process involves: Creating: 1. 2. 3. 4. 5. A Log Field A Log Filter A Database Field A Report, and adding it to the Reports Menu Optionally, creating a Cross-reference Group
In Sawmill 7, steps 1, 3 and 5 involved editing the profile CFG file; step 2 involved creating a filter with the Log Filter editor; and step 4 involved creating a report with the Report Editor. This was a somewhat involved and technical process, especially the CFG editing. With the release of Sawmill 8, this has all gotten a lot easier. Not only is it possible to do all steps separately through the web interface (thanks to the new Log Field Editor, Database Field Editor, and Cross-reference Group Editor), but there is also a New Field Wizard to do all of this for you, in a single step. This newsletter describes using the New Field Wizard to create a custom field.
The New Field Wizard The New Field Wizard, which is in the Config section of the profile, creates a non-aggregating (non-numerical) field, and everything associated with it. For instance, it could be used to create a field "real name" which computes the real name of a user, from the username in the log data, and displays it in a new Real Names report. Or, it can be used to extract URL parameter from the query string of a URL, and report on it as a separate field. We will be demonstrating the latter example. The log data we're using in this example is an Apache web server access log, from an older web site which provided, among other things, Sawmill documentation for a previous version of Sawmill. It contains pages like this: /cgi-bin/sawmilldocs?ho+faq-enginesbyphrases Sawmill's default behavior is to chop off everything after the question mark (to keep the database simple), so the reports will show this: /cgi-bin/sawmilldocs?(omitted) But if we want to know the different values of the "ho" ("help on") parameter (which help pages were requested), that won't do--we need this information in the database. We could disable the log filter that chops off the query string, but that would make the Page field much more complicated, and would also include all query strings, even if we only care about the "ho" parameter. Furthermore, the "ho" parameter really functions as a separate field, and it would be preferable to have a separate "Help Chapters" report which shows all the different values of the "ho" parameter, like a standard report. Finally, it would be nice to filter or zoom on this parameter like any other field. This is all possible if it is a custom field, so let's go to the New Field Wizard and set it up there. In Config -> More Options -> New Field Wizard,
New Field Wizard (Start Page) Now we click "Click here to start the wizard" to start creating the field. In the first page that appears, we enter the name of the field. We'll call this field "Help Chapters":
New Field Wizard (Field Name) Then we click Next, to choose the log field options:
New Field Wizard (Log Field Options) In most cases, you will want to leave these options alone; but if the field has a natural hierarchy in it (e.g., a pathname field, which has a hierarchy of directories), you can describe that hierarchy here, which will allow you to drill into it, in the final report. In this case, the field has no internal structure--it's just a single flat (non-hierarchical) value--so we'll leave the type as Flat, and the rest of the options at default values also. We could always change them later, in the Log Field Editor. Then we click Next, to choose the database field options:
New Field Wizard (Database Field Options) These options specify the hierarchy levels to be tracked in the database, and since this is a non-hierarchical field, we can leave them at their defaults (again, you can always change them later in the Database Field Editor), and click Next, to choose the report field options:
New Field Wizard (Report Field Options) Report Fields are a new concept in Sawmill 8--they correspond to columns in table reports. In earlier versions of Sawmill, database fields were used directly as columns of table reports but having Report Fields separate, allows for more flexibility, for instance by having a single hierarchical database field providing columns for Countries, Regions, and Cities (three report fields). But in most cases, the report field is based directly on the database field, with one report field per database field. That's what we'll do in this case. This is a simple "string" field, with no special formatting required, so we'll leave Display Format as "String." We'll omit the column label, to use the database label ("Help Chapters") as
the label in reports. Then we click Next, to go to the Log Filter page:
New Field Wizard (Log Filter) In order for the custom field to have a value, it must be computed from a log filter (or parsed by the parser, but normally it's done with a filter). Here we choose the name of the filter; we'll get into the details of the filter to extract the field value later. We click Next, to go to the Report Options page:
Log Field Wizard (Report Options) This specifies the name of the report to be created, and whether it should be shown in dynamic report (e.g., when you click Reports in the web browser), or static reports (e.g., when you generate reports from the Scheduler), or both.
Finally, we click Finish. This creates the log field, creates the database field, creates the report field, creates the log filter, and creates the report, and adds it to the report menu. Now, the only thing that remains is to edit the log filter that was just created, to set the field's value. So we click Log Filters, check the new Help Chapters log filter to turn it on, choose "Expression" as the Type, and edit the Expression like this:
Log Filter This uses a regular expression to extract everything after the "?ho+" section of that page, and put it in the Help Chapters log field. Now rebuild the database (Config -> Database Info -> Rebuild Database), and view the reports, and you'll see the new field in its own report:
This report shows all values of the "ho" URL parameter -- "(empty)" indicates that the hit was not a help_on hit, but some other URL. The Help Chapters field is now a full peer of all other fields in the profile; you can use it to generate reports, to zoom, to filter reports, to create pivot tables, etc.
Professional Services This newsletter describes creating custom fields. If you need assistance creating a custom field in your profile, our Sawmill Experts can help. Contact sales@sawmill.net for more information.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
May 15, 2009
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 8.0.7 shipped on April 17, 2009. This is a minor "bug fix" release, and it is free to existing Sawmill 8 users. It is recommended for anyone who is experiencing problems with Sawmill 8.0.6 or earlier. You can download it from http:// sawmill.net/download.html . Sawmill 7 users can upgrade to Sawmill 8 for half of the license price; or if you have Premium Support, the upgrade is free. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. This issue of the Sawmill Newsletter describes "custom actions," a new method in Sawmill 8 for creating custom commandline "-a" actions. Get The Most Out Of Sawmill With Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.
Tips & Techniques: Creating Custom Actions Note: this applies only to Sawmill 8 and later; earlier versions of Sawmill do support custom actions. Sawmill supports a number of built-in "actions" which can be run from the command line with the -a option, including bd (build database), ud (update database), srbe (send report by email) and many more. Starting in Sawmill 8, this has been extended to allow you to create your own -a actions, for performing any custom task. Custom -a actions are defined using a CFG text file, which contains some structured information (parameters, types, defaults, documentation), and a chunk of Salang code which implements the action. Several custom actions are included with the default installation of Sawmill:
q
clear_report_cache
q q q q q q q
and can be used as examples of the format of the custom action file. In this newsletter we will create a new action, one which creates a new user (this custom action will also be included in future versions of Sawmill). This newsletter assumes some knowledge of programming or scripting; when you create a custom action, you are basically writing a script in Salang (Salang is the Sawmill Language, the built-in scripting language used throughout Sawmill). Creating A Custom Action To Create A New User Users can be created from the Sawmill web interface, or by editing users.cfg directly, but there is no command line for creating a user. This is a useful feature to have, and a good way to demonstrate custom actions and some Salang concepts, so we'll add it now. To add a new custom action, all you do is create a new CFG file in LogAnalysisInfo\actions. Here's the file for this new action; we'll explain it bit-by-bit below:
create_user = { label = "Create User" shortcut = "cu" parameters = { username = { shortcut = "u" required = true } pass = { shortcut = "pw" required = true } roles = { shortcut = "r" required = true } profiles_allowed = { shortcut = "pa" required = false } } expression = ` # Create a new subnode of users, named after the username node users = 'users'; node user = users{username}; # Set the "username" subnode to the username @user{'username'} = username; # Set the password checksum @user{'password_checksum'} = md5_digest(pass); # If the -pa option was specified, set which profiles this user can access if (profiles_allowed ne "(unspecified)") then ( # Build the "access.0.profiles" node from the profiles_allowed option (which is commaseparated profile names) split(profiles_allowed, ',', 'v.profiles_allowed_split'); node profile; foreach profile 'v.profiles_allowed_split'
@user{'access'}{'0'}{'profiles'}{@profile} = @profile; ); # if profiles_allowed # Build the "access.0.roles" node from the roles option (which is comma-separated node names) split(roles, ',', 'v.roles_split'); node role; foreach role 'v.roles_split' @user{'access'}{'0'}{'roles'}{@role} = @role; # Save the users.cfg file save_node(users); echo("Created user " . username); ` # expression } # create_user create_user.cfg Below, we describe each part of this file. The CFG Node Name Wrapper Every CFG file must start and end with the lines shown below. The name of the file is create_user.cfg, so the name of the node (nodes are a basic data type used in Sawmill and Salang; they can store any type of data, including maps/hashes) is create_user, so the file must start with "create_user = {" and end with "} # create_user":
create_user = { ... } # create_user The final "# create_user" is really just a comment, but it is strongly recommended that you include this comment, to make it easier to see which bracket you're closing. The filename must match the node name listed on the first line, or Sawmill will give an error when it attempts to access the file. The Label And Shortcut Within the custom action node, there must be a label and shortcut node:
label = "Create User" shortcut = "cu" The label node is the human-readable name of the action, as it will appear in the documentation (custom actions automatically document themselves in the -a section of the documentation), and anywhere else in the web interface where custom actions might later be included. The shortcut is a short form of the action's internal name, which can be used on the command line in place of the full action. With an action internal name of create_user and a shortcut of cu, this action can be run with either "sawmill -a create_user ..." or "sawmill -a cu". The Custom Action Parameters
parameters = { username = { shortcut = "u" required = true } pass = { shortcut = "pw" required = true } roles = { shortcut = "r" required = true } profiles_allowed = { shortcut = "pa" required = false } } Custom actions can take command-line parameters. These parameters are listed in the "parameters" subnode of the custom action node. In this case, we have four parameters: username, pass, roles, and profiles_allowed. They all have shortcuts, too, so you can use either "-username bernice" or "-u bernice" on the command line. The "required" subnode of the parameter specifies whether a shortcut must be present on the command line--if it is true, and the action is attempted without specifying the required parameter, Sawmill will respond with an error message stating that the parameter is required. In this case:
q q q
The username parameter specifies the name of the user to be created. The pass parameter specifies the password of the new user. The roles parameter specifies a comma-separated list of internal role names (the node names from roles_standard.cfg or roles_enterprise.cfg in the LogAnalysisInfo folder of the Sawmill installation; typically these will be either "role_1" for an Administrator, or "role_2" for a "Statistics Viewer"), but they can also be the node names of any custom role you have created. They must be the node names from the role file (the part before the "="), however, not the node label: "role_1", not "Administrator". The profiles_allowed parameter specifies a comma-separated list of the internal names of profiles this user should be allowed to access. Again, these are the internal names of the profile (as shown by "-a lp" or by looking at the filenames in LogAnalysisInfo\profiles and chopping off the final .cfg), not the profile labels you see in the web interface.
There is also a "description" subnode which can be included as a subnode of any parameter; if this exists, it will be included in the documentation as a description of the parameter. These are omitted here for brevity, and are not required; see the standard convert_version_7_profile custom action for a fully self-documented example. The Expression Wrapper The expression subnode of the action defines a Salang expression which is executed when the action is executed. The Salang expression is in a string parameter called "expression", so the following must wrap the expression:
expression = ` <Salang code> ` # expression The string is quoted using backtick quotes (`), so that both single and double quotes can be used without escaping, within the code itself. As earlier, the "# expression" section is a comment, and is technically optional, but is strongly recommended to make it easier to see what the quote is closing. The Expression The work of the action is done by the Salang code itself:
# Create a new subnode of users, named after the username node users = 'users'; node user = users{username}; # Set the "username" subnode to the username @user{'username'} = username; # Set the password checksum @user{'password_checksum'} = md5_digest(pass); # If the -pa option was specified, set which profiles this user can access if (profiles_allowed ne "(unspecified)") then ( # Build the "access.0.profiles" node from the profiles_allowed option (which is commaseparated profile names) split(profiles_allowed, ',', 'v.profiles_allowed_split'); node profile; foreach profile 'v.profiles_allowed_split' @user{'access'}{'0'}{'profiles'}{@profile} = @profile; ); # if profiles_allowed # Build the "access.0.roles" node from the roles option (which is comma-separated node names) split(roles, ',', 'v.roles_split'); node role; foreach role 'v.roles_split' @user{'access'}{'0'}{'roles'}{@role} = @role; # Save the users.cfg file save_node(users); echo("Created user " . username); When executed with a command like this: sawmill -a cu -u cindy -pw "cnd#yZpAswd" -r role_2 -pa "ae,ae2" this code adds the following node to the users.cfg file in LogAnalysisInfo, which adds user "cindy" to Sawmill:
cindy = { username = "cindy" password_checksum = "794be188061e1443d2e3def9e4de88f6" access = { 0 = { profiles = { ae = "ae" ae2 = "ae2" } # profiles roles = { role_2 = "role_2" } # roles } # 0 } # access } # cindy The code is documented with general comments, but for programmers unfamiliar with Salang, here is a detailed analysis of what the code is doing, and how. The code is the same as above; but with blue explanations interleaved below each section of code:
# Create a new subnode of users, named after the username node users = 'users'; When used in a node context, a string is treated as a full nodename. Therefore "users", when used in node context (e.g., assigning to a node variable, as above), tells Sawmill to find the node whose full nodename is "users". The root of the node hierarchy is the LogAnalysisInfo folder, so "users" refers to the node named "users" within LogAnalysisInfo. This could be a folder called "users", whose value is its structured contents; or a file called "users.cfv" whose value is its literal contents; but since neither exist, it finds the file "users.cfg" whose value is its structured contents. So "node users = 'users'" gets a node points to users.cfg, and automatically loads it into memory, using the structured contents of the file (i.e., parsing "name = value" pairs in the text of the file, and treating {} sections as subnodes). node user = users{username}; users{username} gets a pointer to the subnode of users (which points to users.cfg), whose name is the value of the username parameter. So this finds the "cindy" subnode of "users". If it does not exist, it creates it with an empty value. Then it assigns it to the "user" variable, which can then be used through the remainder of the code to access the new user node. # Set the "username" subnode to the username @user{'username'} = username; @user{'username'} refers to the value of the "username" subnode of the user node (as opposed to user{'username'}, which would be a pointer to the node, not its value). Put another way, @ dereferences the pointer to the "username" subnode. Assigning the value of the username variable to this sets the value of that subnode. # Set the password checksum @user{'password_checksum'} = md5_digest(pass); md5_digest() is a built-in Salang function which returns the MD5 digest of its parameter; this then assigns that to the "password_checksum" subnode of the user node. # If the -pa option was specified, set which profiles this user can access if (profiles_allowed ne "(unspecified)") then ( When a custom action parameter is not specified, its value is set to literally "(unspecified)". So this checks if the -pa option was specified. # Build the "access.0.profiles" node from the profiles_allowed option (which is commaseparated profile names) split(profiles_allowed, ',', 'v.profiles_allowed_split'); split() is a built-in Salang function which is being used in this case to split the profiles_allowed list on commas into a temporary node (v.profiles_allowed_split) for use in the loop below. node profile; foreach profile 'v.profiles_allowed_split' This loop cycles through all subnodes of the temporary node created above. The variable "profile" points to each subnode in turn, so @profile is the name of each profile in turn. @user{'access'}{'0'}{'profiles'}{@profile} = @profile; By stringing together a sequence of {} operators, this line creates an "access" subnode of the user record, and within that a "0" subnode, and within that a "profiles" subnode, and within that a subnode whose name is the profile name; then it assigns the value of that bottom subnode to the name of the profile. ); # if profiles_allowed # Build the "access.0.roles" node from the roles option (which is comma-separated node names) split(roles, ',', 'v.roles_split'); node role; foreach role 'v.roles_split' @user{'access'}{'0'}{'roles'}{@role} = @role; This loop does the same as the profiles loop above, but for roles.
# Save the users.cfg file save_node(users); The users.cfg was automatically loaded from disk by the reference to "users" at the top. All modifications to this node have been made in memory so far. This line writes the node back to disk, replacing the disk version of users.cfg with the memory version. echo("Created user " . username); Finally, this echoes information to the standard output stream, to describe what it just did.
Professional Services This newsletter describes creating a custom action. Custom actions are scripts, and require some programming expertise to create. If you need assistance creating a custom field in your profile, our Sawmill Experts can help. Contact sales@sawmill. net for more information.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Newsletters
Sawmill Newsletter
June 15, 2009
Youre receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of UNSUBSCRIBE to newsletter@sawmill.net . News Sawmill 8.0.8 shipped on May 20, 2009. This is a minor "bug fix" release, and it is free to existing Sawmill 8 users. It is recommended for anyone who is experiencing problems with Sawmill 8.0.7 or earlier. You can download it from http://sawmill.net/download.html . Sawmill 7 users can upgrade to Sawmill 8 for half of the license price; or if you have Premium Support, the upgrade is free. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. This issue of the Sawmill Newsletter describes the use of cross-reference tables to increase the speed of custom reports. Get The Most Out Of Sawmill With Professional Services Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire. com.
Tips & Techniques: Using Cross-Reference Tables Cross-reference tables (sometimes called cross-reference groups, or xrefs), are tables created by Sawmill in its back-end database. Crossreference tables are generated during database builds and updates, and contain aggregated information from the main table of the database. For instance, a particular cross-reference table might contain one row for each day in a media server log dataset, with the number of accesses, play duration, bytes transferred, unique IPs, sessions, etc. for that day. When Sawmill generates an unfiltered Days report, it can generate it directly from this table, which is much smaller than the main table of the database; this allows it to generate this report, and other top-level reports, very quickly. Cross-reference tables are created by default for each non-aggregating field in the database, which means there is roughly one cross-reference table for each report, so all top-level reports are boosted by cross-reference tables. Furthermore, every default cross-reference table also contains the date/time field, so any report can be filtered by date, and still use a cross-reference table. So all default unfiltered reports, and all default date-filtered reports, are accelerated using cross-reference tables. If you add a new database field, however, or create a new report, or if you often use a particular combination of filters on a particular report, you may need to modify the cross-reference tables to ensure that this custom or filtered report is also fast. This newsletter gives an example of this. Building The Report
We'll start with a spam filtering server dataset. Clicking Senders shows the top sender domains in the data:
Senders Report If we also want to know, in a single report, the top Recipients for each Sender, we can use a Pivot Table, by clicking Customize, then the Pivot Table tab, then "Show pivot table", and then selecting Recipients as the drill-down:
Customize Report Element: Add A Pivot Clicking OK begins to generate this report. But it takes several minutes, and while it's thinking, it displays this progress bar:
Progress Of Report Using Main Table The key phrase to watch for is "Querying main table." The main table is the primary table of the database, with one row for each, even in the log data. It can be millions or billions of lines long, depending on the dataset. Querying the main table can take many minutes, especially for an unfiltered report (where every single row must be aggregated to create the report), so it is to be avoided for any reports you use frequently. That's what xref tables are for, so let's cancel this report and head to Config to make it faster. In Config, click More Options, then Cross Reference Groups, and duplicate the existing Recipient group, calling it "Sender x Recipient", and add the Sender field to it:
Adding A Cross-Reference Group After adding a cross-reference group, we need to rebuild the database. After the build is complete, we can return to the reports, and again do the Recipients x Senders pivot table report. This time, the report comes up quickly:
The Final Report That's it--from now on, that report will be fast. As an added benefit, this also speeds up Recipient reports filtered on a particular Sender, or Sender reports filtered on a particular Recipient. Which Fields Do You Need? In order to get a particular report to use an xref group, you need all the fields in the report, plus all the fields in the filters. So if you have a report with three columns: Sender, Messages, and Bytes; and if you also have two filters on that report, a Date/time (date range) filter and a Recipient filter, you will need all those fields in the xref table: Sender, Recipient, Date/time, Messages, and Bytes. Professional Services This newsletter describes optimizing reports with cross-reference tables. This is one example of many types of optimization which are possible to make Sawmill build databases faster, and generate reports faster. Other possibilities include selective indexing, log splitting, field simplification, horizontal shrinkage (eliminating fields), vertical shrinkage (eliminating rows), and hierarchical cross-reference optimization. For large environments where performance is important, we recommend Sawmill Professional Services to help you quickly optimize your installation. If you need assistance with optimization, or with any other Sawmill tasks, our Sawmill Experts can help. Contact sales@sawmill. net for more information.
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
This can be useful if you want to break down each item in a table by some other report view, for instance here we will see what traffic looked like by day, for the directory '/download/'. To do this, we choose 'URLs/Directories' view from the Report Menu, then click on '/download/' in the table and then 'Zoom to' the Days view.
You can change the Zoom menu view repeatedly, and filter each view from the Zoom to Menu by the '/download/' directory (you could also use Global Filters to add a report wide filter rather than using the Zoom Filters).
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
The Report
User Guide Index The Report Each section of this image clicks through to further information about that section. An Overview of Sawmill Reports Filters Understanding the Reports
q q q q q q
The Report Header The Report Toolbar The Report Menu The Report Bar The Report Graph The Report Table
Sawmill Documentation
Quickstart Manual FAQ User Guide www.sawmill.co.uk
Tracking Ad Campaigns
User Guide Index An Overview of Sawmill Reports Filters Understanding the Reports Once you have set up the URL's in your campaing (by encoding the extra parameters on the end of the URL to your home page) you can use Sawmill to track what traffic is comming in from them. 1. 2. 3. 4. 5. 6. 7. Login to Sawmill Click "Show Reports" Click on the Pages (or URL's) view from The Report Menu Select the "Filter" Option from The Report Toolbar Scroll down the window until you see the "Page" or "URL" field and click "Add New Filter Item" Click the "Wildcard Expression" radio button Enter "*source=google*" into the text box (notice the ' * ' at the start and end) and click OK
8. Check the box turning this filter on (left of the field name (URL in this example))
9. Click "Save and Close" to close the Filter Editor 10. Sawmill will apply this filter and refresh the report