cssutil(1)                          CRM114                          cssutil(1)



  NAME
      cssutil - utility to measure and manipulate CRM114 statistics files.

  SYNOPSIS
      cssutil [.css file] [OPTIONS]

  OPTIONS
      -h
         print basic help

      -b
         brief - print only a summary of the statistics of the .css file (oth-
         erwise, prints a full list of how  many  bins  are  in  each  counter
         state)

      -q
         quiet mode; no warning messages

      -r
         report  then exit (no menu). The default if -r is not specified is to
         drop into a command-menu based system.

      -s
         if no css file found, create new one with this many buckets.  Default
         is 1 million + 1 buckets

      -S
         same as -s, but round up to next 2^n + 1 boundary.

      -v
         print version and exit

      -D
         dump  css  file to stdout in the architecture-independent CSV format,
         suitable for reloading with -R in an architecture.  (note  that  .css
         files are a hardware-architecture dependent format)

      -R
         create and restore css from the hardware-architecture independent CSV
         format file (reads from stdin if csv-file is not supplied.

  THE COMMAND MENU
      If -r is not supplied, a menu appears with the following  options.  Note
      that  all  of  these operations are "in place" and surgical- there is NO
      undo functionality. Wise users will make a backup copy of all .css files
      before using cssutil to alter values.

      -Z
         zero  all  bins  at or below a value. This is useful for deleting all
         small-count features from the .css statistics files  leaving  higher-
         count features untouched.

      -S
         subtract  a  constant  from all bins - this rolls all features back a
         constant amount.

      -D
         divide all bins by a constant - this rolls  features  back  linearly,
         rather than in scalar fashion.

      -R
         rescan - regenerate the statistics output that was initially printed.

      -P
         pack - re-slot features to optimize access time.

      -Q
         - gracefully exit, saving changes. (note that since these  operations
         are  in-place and surgical, there is no option to exit without saving
         changes.

  DESCRIPTION
      cssutil is a general utility to manipulate and measure the  .css  format
      statistics  files  used  by  CRM114's Markovian and OSB classifiers. The
      biggest uses are to check the available space remaining in a .css  file,
      to  selectively  groom  a  .css file, and to port architecture-dependent
      .css files to and from an ASCII CSV format, which is architecture  inde-
      pendent.   The  cssutil  program  can be used to create information-less
      .css files:

           cssutil -b -r spam.css
           cssutil -b -r nonspam.css

      . This creates the full-size files ./spam.css and ./nonspam.css, holding
      no  information.   The  cssutil  program can be used check that the .css
      files are reasonable.  Invoke cssutil as:

          cssutil -b -r spam.css
          cssutil -b -r nonspam.css

      You should get back a report something like this:

           Sparse spectra file spam.css statistics:

           Total available buckets          :      1048576
           Total buckets in use             :       506987
           Total hashed datums in file      :      1605968
           Average datums per bucket        :         3.17
           Maximum length of overflow chain :           39
           Average length of overflow chain :         1.84
           Average packing density          :         0.48

      Note that the packing density is 0.48; this means that this .css file is
      about  half  full of features. Once the packing density gets above about
      0.9, you will notice that CRM114 will take longer to process  text.  The
      penalty is small below packing densities below about 0.95 and only about
      a factor of 2 at 0.97 .  Best is to keep it below .7 to .8.

  SHORTCOMINGS
      Note that cssutil as of version 20040816 is NOT capable of dealing  with
      the CRM114 Winnow classifier's floating-point .cow files. Worse, cssutil
      is unaware of it's shortcomings, and will try anyway. The only  recourse
      is  to be aware of this issue and not use cssutil on a Winnow classifier
      floating point .cow format file.

  HOMEPAGE AND REPORTING BUGS
      http://crm114.sourceforge.net/

  VERSION
      This manpage: $Id: cssutil.azm,v 1.4 2004/08/19 09:23:24 vanbaal  Exp  $
      This   manpage   describes   cssutil  as  shipped  with  crm114  version
      20040816.BlameClockworkOrange.

  AUTHOR
      William S. Yerazunis. Manpage typesetting by Joost van Baal and  Shalen-
      dra Chhabra

  COPYRIGHT
      Copyright  (C) 2001, 2002, 2003, 2004 William S. Yerazunis. This is free
      software, copyrighted under the FSF's GPL. There  is  NO  warranty;  not
      even  for  MERCHANTABILITY  or FITNESS FOR A PARTICULAR PURPOSE. See the
      file COPYING for more details.

  SEE ALSO
      cssmerge(1), cssdiff(1), crm(1), crm114(1)



  cssutil 20040816.BlameClockworkOra19eAugt2004                       cssutil(1)
