Dunno what this is about - has been in here for ages...

* detect bad rc from calldisinfector and dodge rescan in
	disinfect.pl. set NewReports to == Reports.

* Now I get what's going on (mostly) with the eval/fork/die stuff.

Written while I was suffering Kaspersky etc., intended to turn into a
doc. for anyone writing extra scanner support - "if you don't follow
these guidelines we're not likely to be able to use your code":

* Tips for writing scanner support:
  * "print STDERR $line" is your friend.
  * Always parse *every* line of output from the scanner, and
    die if you don't understand it.
  * Be *extremely* anal when writing regexps, especially with
    quantities of whitespace.
  * Only use wildcards to match the filename part of the output,
    *never* to match whitespace or boilerplate text (think about
    what might happen if the filename has a trailing <space> character).
  * At least one scanner prints "<cr><space>...<space><cr>"
    before outputting its results -- be *sure* what the scanner's
    output format really is.
  * Be sure that you know how your scanner reports infections
    within archives; they can easily be mis-parsed.
  * Use comments to document any oddities that could confuse
    your parser; that way we might be able to ensure that they
    don't happen in future.
  * Use comments to document the output format you are expecting
    from the scanner so that when it changes, debugging is quicker.
  * Watch out for scanners reporting different categories of Bad
    Thing - e.g. "Joke Program", "Trojan", "Virus", "Worm"... it
    is a good idea to run "strings" over a core dump from the scanner
    to get clues as to what may be reported if you're not sure.


TODO:

* deal with options to TNEF explicitly? There's a comment in the code
  that "people started adding options and the old code broke". Worth
  a quick look?

* portable locking. Also check *why* we failed to lock.

* optional specification of MIME handlers (e.g. for arj, lha, zoo... in case
your scanner doesn't support it)

* rearrange dirs
  last thing before major release...

* anti-spam stuff is probably now big enough to break out into its own module.
  this will reduce size of sendmail.pl nicely.

* merge RAV support - DONE

* merge panda support - DONE

* merge AntiVir support - DONE

* CLEAR environment near the top of main mailscanner script (except for
  known needed items)

* Fix problems with spamassassin and dropped privs. - DONE

* Investigate potential locale problems with scanners.

* Check out TNEF setup - what to do if "internal" specified and module not
  present, or external TNEF expander not present?

* Aren't checking that SA prefs file exists where the comment says it must
  in readconfig.pl -- move default to with others and check as with others.

* ReadFileNameRules etc. should take a ref to put the result in rather than
  assuming a 'global' variable.

* autoconfiscation
  * change scanner_type to scanner - DONE
  * change mta_confspec_[in|out] to mtaoutconf and mtainconf - DONE
  * deal with check_mailscanner
  * test!
  * write scripts to call configure with correct options for Red Hat, Solaris etc.
  * make /usr/local config simple - DONE
  * don't set owner/group unless specified - TEST
  * switch creation of subdirs for perlmoddir,docdir,includedir,exec_prefix,libdir,libexecdir - DONE
  * create working area directories, piddir etc. -- in fact every directory that is used - DONE
  * set tnef_opts and expand_tnef with configure - DONE
  * control RunAsUser/Group from configure, and set group perms on pidfile(s) +
    lockfiles + working dirs appropriately... *sigh* - TEST
  * differentiate between *system* /etc and *mailscanner* /etc (exim stuff in sys, ours in ms)
    ?? - we already have mtainconf and mtaoutconf separately. What else is needed? ??



check_mailscanner:

OpenBSD 2.7:
	ps axww (ps -axww)
	NOT POSIX-COMPLIANT
	ps -ef returns false and outputs only to STDERR
	`uname` = OpenBSD
	`uname -a` = OpenBSD <hostname> V.v GENERIC#25 i386

FreeBSD:
	ps -ef supported according to manpage, but requires root for -f, and -e means
	something else (print environment).
	uses ww for v. wide.
	use ps -axww
	has grep -F and fgrep
<BlindMan> COLUMNS doesn't have a effect
<nwp> thought not :(
<BlindMan> ps -ef
<BlindMan>   PID  TT  STAT      TIME COMMAND
<BlindMan> 22891  p2  Ss     0:00.13 PATH=/root/bin:/bin:/usr/bin:/usr/local/bin:/usr/X11R
<BlindMan> 22893  p2  R+     0:00.00 PWD=/root PAGER=less FTP_PASSIVE_MODE=YES HOSTNAME=ta
<BlindMan>   227  v0  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv0
<BlindMan>   228  v1  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv1
<BlindMan>   229  v2  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv2
<BlindMan>   230  v3  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv3
<BlindMan>   231  v4  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv4
<BlindMan>   232  v5  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv5
<BlindMan>   233  v6  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv6
<BlindMan>   234  v7  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv7
<BlindMan> COLUMNS=500 ps -ef
<BlindMan>   PID  TT  STAT      TIME COMMAND
<BlindMan> 22891  p2  Ss     0:00.14 PATH=/root/bin:/bin:/usr/bin:/usr/local/bin:/usr/X11R
<BlindMan> 22894  p2  R+     0:00.00 PWD=/root PAGER=less FTP_PASSIVE_MODE=YES HOSTNAME=ta
<BlindMan>   227  v0  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv0
<BlindMan>   228  v1  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv1
<BlindMan>   229  v2  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv2
<BlindMan>   230  v3  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv3
<BlindMan>   231  v4  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv4
<BlindMan>   232  v5  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv5
<BlindMan>   233  v6  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv6
<BlindMan>   234  v7  Is+    0:00.02 TERM=cons25 /usr/libexec/getty Pc ttyv7
<BlindMan> ps -axww
<BlindMan>   PID  TT  STAT      TIME COMMAND
<BlindMan>     0  ??  DLs    0:00.59  (swapper)
<BlindMan>     1  ??  ILs    0:00.05 /sbin/init --
<BlindMan>     2  ??  DL     0:01.57  (pagedaemon)
<BlindMan>     3  ??  DL     0:00.00  (vmdaemon)
<BlindMan>     4  ??  DL     0:07.65  (bufdaemon)
<BlindMan>     5  ??  DL     2:48.93  (syncer)
<BlindMan>    29  ??  Is     0:00.00 adjkerntz -i
<BlindMan>   100  ??  Is     0:04.39 /sbin/dhclient ed0
<BlindMan>   133  ??  Ss    16:46.57 /sbin/natd -n ed0
<BlindMan>   149  ??  Ss     0:08.75 syslogd
<BlindMan>   169  ??  Is     0:00.00 inetd -wW
<BlindMan>   171  ??  Is     0:12.01 cron
<BlindMan>   174  ??  Is     0:52.79 /usr/sbin/sshd
<BlindMan>   223  ??  Is     0:02.13 /usr/local/sbin/sendfiled -Q
<BlindMan> 22794  p0  Is     0:00.11 -bash (bash)
<BlindMan> 22795  p0  S+     0:02.89 ssh davinci
<BlindMan> 22829  p1  Ss     0:00.21 -bash (bash)
<BlindMan> 22892  p1  DN+    0:01.20 find -f /
<BlindMan> 22891  p2  Rs     0:00.15 -bash (bash)
<BlindMan> 22895  p2  R+     0:00.00 ps -axww
<BlindMan>   227  v0  Is+    0:00.02 /usr/libexec/getty Pc ttyv0
<BlindMan>   228  v1  Is+    0:00.02 /usr/libexec/getty Pc ttyv1
<BlindMan>   229  v2  Is+    0:00.02 /usr/libexec/getty Pc ttyv2
<BlindMan>   230  v3  Is+    0:00.02 /usr/libexec/getty Pc ttyv3
<BlindMan>   231  v4  Is+    0:00.02 /usr/libexec/getty Pc ttyv4
<BlindMan>   232  v5  Is+    0:00.02 /usr/libexec/getty Pc ttyv5
<BlindMan>   233  v6  Is+    0:00.02 /usr/libexec/getty Pc ttyv6
<BlindMan>   234  v7  Is+    0:00.02 /usr/libexec/getty Pc ttyv7
<BlindMan> so
<BlindMan> anything else?
<nwp> `uname` and `uname -a`??
<BlindMan> why the latter?
<nwp> just in case it changes behaviour at some point.
<BlindMan> it's an older fbsd one
<BlindMan> 4.2-STABLE
<BlindMan> as of feb 2001
<nwp> uh-huh
<BlindMan> maybe i find a newer one
<BlindMan> let's see
<nwp> no prob if it's old, just so that I know where the ps output came from
<BlindMan> FreeBSD 4.2-STABLE
<BlindMan> from about feb 2001
<nwp> is that what uname -a gives?
<nwp> and uname just gives 'FreeBSD'?
<BlindMan> yes
<BlindMan> it's freebsd
<nwp> oh, no caps?
<BlindMan> believe me, i know what's running on my computers ;)
<BlindMan> FreeBSD
<nwp> right. That's great. Thanks very much.
<BlindMan> FreeBSD taffarel 4.2-STABLE FreeBSD 4.2-STABLE
<BlindMan> np
<BlindMan> what are you doing with it?
<BlindMan> just curoius :)
<nwp> oh, hang on what was 'FreeBSD taffarel 4.2-STABLE FreeBSD 4.2-STABLE' - looks like two concatenated?
<BlindMan> bash-2.03$ uname -a
<BlindMan> FreeBSD taffarel 4.2-STABLE FreeBSD 4.2-STABLE #1: Thu Feb 22 21:06:42 CET 2001     root@taffarel:/usr/obj/usr/src/sys/TAFFAREL  i386
<BlindMan> now oyu see the whole :P
<nwp> still looks odd - where'd the "root@tafferel" come from?
<nwp> but that whole line is really what it says?
<BlindMan> that's just saying who/where/what kernel config files the kernel was built with
<nwp> cool.
<BlindMan> s/files/file/
<nwp> I'm just trying to tidy up a script that goes with mailscanner, and autoconf-ing it. uses ps to try to find the mailscanner process... previously there were 3 different versions.
<nwp> most things do POSIX, but OpenBSD and FreeBSD (and I guess NetBSD) don't.
<nwp> Unfortunately FreeBSD accepts the POSIX options, so I will have to grep `uname` for BSD and work off that II guess.
<BlindMan> i c

	
POSIX (Solaris, HPUX, Debian...):
	uses COLUMNS, truncates at 80 cols otherwise.
	ps -ef
	-e == every process  -f == full listing (adds path + args)
	(no output to STDERR)
	grep -F is POSIX; fgrep is not.
	ww for ps is NOT POSIX

Tru64:
	output from mjb

lorien# ps -ef | head -1
UID         PID   PPID    C STIME    TTY             TIME CMD
lorien# ps -ef | tail -4
root       8484  25049  0.0   Apr 18 ttyp1        0:00.98 -ksh (ksh)
root      14696   8484  0.0 15:19:57 ttyp1        0:00.01 tail -4
obelix    25049  25548  0.0   Apr 18 ttyp1        0:00.15 -ksh (ksh)
root      25235   8484  0.0 15:19:57 ttyp1        0:00.09 ps -ef

	ps -ef appears not to provide full path. *shrug*... now it does...

<||> $ ps -ef | head -4
<||> UID         PID   PPID    C STIME    TTY             TIME CMD
<||> root          0      0  0.8   Nov 24 ??        1-15:50:14 [kernel idle]
<||> root          1      0  0.0   Nov 24 ??          30:26.36 /sbin/init -a
<||> root          3      1  0.0   Nov 24 ??           0:47.17 /sbin/kloadsrv
<||> $ COLUMNS=500 ps -ef | head -4
<||> UID         PID   PPID    C STIME    TTY             TIME CMD
<||> root          0      0  0.9   Nov 24 ??        1-15:50:14 [kernel idle]
<||> root          1      0  0.0   Nov 24 ??          30:26.36 /sbin/init -a
<||> root          3      1  0.0   Nov 24 ??           0:47.17 /sbin/kloadsrv
<||> $ ps axww | head -4
<||>    PID TTY      S           TIME CMD
<||>      0 ??       R <   1-15:50:14 [kernel idle]
<||>      1 ??       S       30:26.36 /sbin/init -a
<||>      3 ??       I        0:47.17 /sbin/kloadsrv


	  has fgrep
	  has grep -F




sunos4.1.1 -> "sunos", AIX 4.3.3 -> "aix", IRIX 6.5.13 -> "irix"
nwp: sunos4 is bsd (ps aux), AIX is both (ps aux and -ef), IRIX is sysv (-ef)
<nwp> I'll trust AIX to be POSIX and obey COLUMNS...
<nwp> what does 'uname' give on SunOS4/AIX/IRIX?
<Stric> SunOS/AIX/IRIX

<Stric> nwp: At 80 chars of command line argument.. (not 80 screenwidth)
<nwp> Stric: IRIX yuk.
<Stric>      -w   Use a wide output format (132 columns rather than  80);
<Stric>           if repeated, that is, -ww, use arbitrarily wide output.
<Stric> SunOS4

<Stric> ps on AIX gives full width if no tty (when using sysv arguments)


IRIX ps man page has no indication of how to get wide output.




CPAN:

File::Lock *does* do fcntl
File::lockf is no good (not guaranteed to interact)
Mail::Box::Locker is no good on BSD from the look of it (assumes can pack "s @256", locktype)


Trying File::Lock...

File::Lock doesn't build, appears to be unmaintained.



*******************

Read through:

Following message locking in queue...

sendmail.pl

FindMessagesToProcess

sendmail.pl@134,135:

  while($file = readdir QUEUE) {
    # Optimised by binning the 50% that aren't H files first
    next unless $file =~ /$MTA::HFileRegexp/;
    $tmpdate = (stat("$InQueueDir/$file"))[9]; # 9 = mtime
    #next unless -f "$InQueueDir/$file";
    next unless -f _;
    $ModDate{$file} = $tmpdate;
  }
  @SortedFiles = sort { $ModDate{$a} <=> $ModDate{$b} } keys %ModDate;

What's with the commented #next... and the replacement?
Looks dodgy to me.


$HitLimit? not initialised... not necessary atm, but clearer.

may at some point need to replace getting id from $1 from MTA::HFileRegexp
with a function 'GetIDFromHFile' or something for some mtas?

repeated matching against that regexp currently; could be avoided.

location of 'cp' can easily be autoconfed and put in config file.
Archives message if necessary.

LOCKS -H file first with SHARED lock. Hmmm... that means we can all get it :(
You need to open it RW to be able to get exclusive lock.

ReadQf -> locking??

UNLOCKS before moving on to NEXT MESSAGE.

So, either we need to make sure that we don't do anything irreversible in this
stage, or we need to leave the file locked in such a way that noone else will
do anything with it.



ReadQf:

Spec does not mention that headers must not contain newlines. Is
this *really* desirable? Or should it be added to spec?

Do we really need to handle exclusions in Exim -H file? How do we
build -H file for modified messages?

No locking.



Main Loop:

MoveToOutGoingQueue second parameter is confusling.

Looks like FindMessagesToProcess should do nothing but that, and should
lock all other processes out.

Main loop should really be broken up a little:
 find messages,
 deal with spam: { spam test, deliver safe },
 av process: { av scan, deliver safe, disinfect, deliver safe }
 cleanup: { make sure nothing left, log + unlock if so }


Explode.pl does not have a package declaration! Is it being included
into multiple namespaces?? No, only the main mailscanner. Still bad, though.
Should probably think about what goes where:
	High-level:
		Generic "sendmail.pl" stuff (queue handling, message handling)
		Spam stuff
		AV stuff (sweep.pl, disinfect.pl)
	Not sure:
		MIME stuff (explode.pl)
		Other message interpretation stuff - TNEF, archives... (also explode.pl?)
	Low-level 'utility' stuff:
		MTA-specific stuff
		logging (logger.pl)
		locking (lock.pl)
		per-scanner support (subdirectory?)

The MIME + related stuff could be split into high-level + low level, or could be
just low-level I suppose.



Possible objects:
	 mailscanner
	 queue (in, out)
	 message
	 avscanner
	 spamscanner
	 mta
	 envelope
	 header
	 body
	 body-part

point a mailscanner obj. at a queue path
instantiates mta queue
processes mta queue
 -> passes queue to spamscanner, avscanner (which flag statuses)
 -> delivers them depending on status
loops for a bit
re-execs

Potentially inefficient but clear.





In current situation, why delete spam early when you have to track it
anyway - why not flag + delete later?

Why not just pass one big hash around with all the others in it?
Which would be kind of like a 'mailscanner' object...




*************

Config


types of variable:

directly set in config file, or not
boolean, simple text, multiline text, multiline array, separated array, simple hash (comma-sep), 'special' (other).