[Prev][TOC][Manual][Home][Next]


Archives


Why does a message get split into mulitple messages with no headers?

If you are processing UUCP mailbox files, messages are separated by a line starting with "From " (ie. The word "From" followed by a space). Some mail software will prefix lines in message bodies with a `>' to avoid MUA's from incorrectly treating the line as a message separator. However, some mail software doesn't.

To avoid incorrect separator detection, many MUAs perform a more stricter detection of separators beyond "From ". MHonArc, by default, will treat lines starting with "From " as a message separator, which can lead to incorrect message termination if the From line has not been escaped with a `>'.

To fix the problem, use the MSGSEP resource to instruct MHonArc to use a stricter test detecting a message separator. The following MSGSEP resource setting is known to work well:

<MsgSep>
^From \S+\s+\S+\s+\S+\s+\d+\s+\d+:\d+:\d+\s+\d+
</MsgSep>

In case you have message separates with quoted local parts in the address part of the separator, you can use the following:

<MsgSep>
^From\s+(?:"[^"]+"@\S+|\S+)\s+\S+\s+\S+\s+\d+\s+\d+:\d+:\d+\s+\d+
</MsgSep>

Make sure to test things out before using in production environments.

If this fails, you can try the CONLEN resource available in v2.0 and later. The CONLEN resource, when set, tells MHonArc to utilize the Content-Length fields in the message head. If your MTA defines this field accurately, then you can utilize this feature. Sun Solaris' delivery agent will define the Content-Length field for messages delivered to local users.

If you use Procmail to filter your mail, you can try the following Procmail recipe (contributed by Christopher Lindsey):

However, one can add a Content-Length: header with everyone's favorite tool, procmail. :) Here's a recipe borrowed from David Tamkin about 9 moons ago:

  :0fhw # B won't help; size conditions ignore H and B flags on the :0 line
  * ! ^Content-Length:.*[0-9]
  * 1^1 B ?? >1
  | formail -a "Content-Length:  $="

So if you want to count on Content-Length, the message could be piped into procmail with a specific procmailrc file which would do this counting and then call MHonArc.

If sendmail is your system's MTA and you use Procmail as your local delivery agent, you can configure things to have Content-Length defined for all locally delivered mail. The following technique is contributed by Jason L Tibbitts III:

I use Procmail as my local delivery agent. I have the following extremely disgusting settings in my .mc (M4 config) file to add in a Content-Length: header, but I'm not sure if I would recommend that anyone actually think of using this:

define(`LOCAL_MAILER_FLAGS',`SPfhn9Z')
define(`LOCAL_MAILER_ARGS',`procmail -a $h -d $u')

LOCAL_CONFIG
#
# Add fake Content-Length Header for local mailer
# This is corrected by Procmail
# Note that the Z flag is used here; if Z is ever defined, this
#  will break something
H?Z?Content-Length: 0000000000

If possible, try to avoid relying on the use of Content-Length since it is hard to guarantee that it is set properly. When CONLEN is specified, MHonArc will read at least the number of bytes specified by Content-Length before checking for the message separator, as defined by the MSGSEP resource. Therefore, the value of Content-Length can be less than the actual message size, and message extraction will work as expected. However, if Content-Length for a message has a value that is larger than the actual size, MHonArc may include the content of the next message as part of the current message.


Can I move a message from one archive to another?

No. In order to achieve the same effect, you must add the original, unprocessed, message to the destination archive, then remove the appropriate HTML version of the message from the source archive.


Can I reconstruct a database from the HTML messages?

Yes. v2.3 of MHonArc introduced a utility program called mha-dbrecover. It gets installed with the other MHonArc files during the installation process. See the documentation for usage information.


Is it safe to add messages to an archive as they are received?

Yes. MHonArc performs archive locking to protect from multiple MHonArc process attempting to write to an archive at the same time. This locking allows MHonArc to safely be used to add messages as they are received.

NOTE:

As an archive increases in size, performing updates as a message is received takes more processing time. Therefore, for large archives, you may need to do updates through a periodic batch process (like via cron(8)) to avoid time-out problems.


So it is safe. How do I do it??

Many users use Procmail <http://www.procmail.org/> to call MHonArc to archive messages. Procmail provides the ability to preprocess mail as it arrives to do selective processing and automated tasks with your mail.

For illustrative purposes, the following simple example shows a possible way of archiving messages as it arrives w/o using a tool like Procmail. This example assumes you are on a Unix-based system using sendmail as the mail transfer agent. Please refer to documentation about sendmail if you are not familiar with it (sendmail, 2ed, from O'Reilly is an excellent source).

The approach shown here uses a .forward file in the home directory of the account you want mailed archived. For this example, let's assume it is my account. Here is how to set up the .forward file to invoke MHonArc on incoming mail:

\ehood, "|/home/ehood/bin/webnewmail #ehood"
NOTE:

The "\ehood" tells sendmail to still deposit the incoming message to my mail spool file. The "#ehood" Bourne shell comment is needed to insure the command is unique from another user. Otherwise, sendmail may not invoke the program for you or the other user.

webnewmail is a Perl program that calls MHonArc with the appropriate arguments. A wrapper program is used instead of calling MHonArc directly to keep the .forward file simple, but you can call MHonArc directly if you want. Here is the code to the webnewmail program:

#!/usr/local/bin/perl
# Edit above path to point to where perl is on your system.

##	Specify a package to protect names from MHonArc.

package WebNewMail;

##	Edit to point to installed mhonarc.

$MHonArc = "/home/ehood/bin/mhonarc";

##	Define ARGV (ARGV is same across all packages).
##	Edit options as required/desired.

@ARGV = ("-add",
	 "-quiet",
	 "-outdir", "/home/ehood/public_html/newmail");

##	Just require mhonarc, this prevents the overhead of a
##	fork/exec.

require $MHonArc;

The webnewmail program has to have the executable bit set. This is achieved by using "chmod a+x webnewmail".

NOTE:

For better scalability and resource usage, the author recommends calling MHonArc from a facility like cron, which is provided on Unix-based operating systems. For those unfamiliar cron, it is a daemon that allows the execution of commands on a scheduled basis.


How can I do it with Majordomo lists?

Here is a template for archiving messages as they arrive for a Majordomo list to include in sendmail's aliases file:

xxxx:                "|/usr/lib/majordomo/wrapper resend -l xxxx xxxx-outgoing"
xxxx-outgoing:       :include:/var/lib/majordomo/lists/xxxx, xxxx-mhonarc
xxxx-request:        list-admin-address
owner-xxxx:          list-admin-address
xxxx-owner:          list-admin-address

xxxx-mhonarc:        "|/usr/lib/majordomo/wrapper mhonarc -add -quiet -outdir /home/httpd/html/yyyyyyy -rcfile rcs.mrc -stderr /var/log/mhonarc" 

Replace text that is rendered like this with what is appropriate for your configuration.

In order to run MHonArc with Majordomo's wrapper, the program has to be in the same directory where the Majordomo programs are located. An easy way to insure this is to create a symbolic link to in Majordomo's program directory to where MHonArc is installed. For example:

prompt> ln -s /usr/bin/mhonarc /usr/lib/majordomo/mhonarc

Make sure /usr/bin/mhonarc is readable and executable by the majordomo user. Something like the following can be done to insure this:

prompt> chmod 755 /usr/bin/mhonarc

If you redirect stderr to a logfile, the logfile must be owned by the majordomo user and be writable by the majordomo user. The directory for that logfile must exist.

The MHonArc-archive directory must be owned by the majordomo user and must have the minimum access permission 755. Group ownership does not matter.


Can I get MHonArc to filter messages to different archives?

No. This is outside of the MHonArc's scope. You can grow your own filter, using the method described in the previous question, to scan the message header an invoke MHonArc with the proper arguments. Or. you can use a tool like Procmail <http://www.procmail.org/>.

NOTE:

You may want to check out the following: mharc at <http://www.mhonarc.org/release/mharc/>: Mharc is a web-based mail archiving system for multiple mailing lists using Procmail, MHonArc, and Namazu.

Here are a some messages from users about using Procmail:

... some text deleted ...

Here is what I use in .procmailrc to archive the mhonarc list:

NEWDATE="`/usr/bin/date +%Y-%m`"
MHONARC_MBOX="/local/mail/lists/mhonarc/$NEWDATE.mbox"
:0: $MHONARC_MBOX$LOCKEXT
* ^Sender:.*owner-mhonarc@
{
        :0 c
        $MHONARC_MBOX

        :0 c
        | /local/mail/mhonarc-1.2.2/mailarchive -add mhonarc "$NEWDATE"
}

Mailarchive is nothing more than a wrapper around mhonarc with my long.
list of options.

Achim
P.S. Procmail itself comes with an example manual page. It's worth
     looking into it.

You can actually dispense with the wrapper if you use environment
variables to pass options to MHonArc, but I'm sure Achim has a good
reason for doing it his way.  Just for the purposes of comparion,
here's how I do it:

eeeweb% cat .procmailrc
#Set on when debugging
VERBOSE=off
#Replace `mail' with your mail directory (Pine uses mail, Elm uses Mail)
MAILDIR=$HOME/Mail
#Directory for storing procmail log and rc files
PMDIR=$HOME/.procmail
#Path and options for mhonarc
MHONARC='/dcs/packages/infosys/bin/mhonarc -add -quiet -umask 022 -idxfname inde
x.html'
:0
* ^Originator:.*@classes.uci.edu
{
  MHHOME=$HOME/classarc
  LOGFILE=$PMDIR/classlists.log
  INCLUDERC=$PMDIR/rc.classlists
}
:0 E
{
  MHHOME=$HOME/mail-arc
  LOGFILE=$PMDIR/otherlists.log
  INCLUDERC=$PMDIR/rc.otherlists
}

and then in the file .procmail/rc.classlists or rc.otherlists (depending
on the Originator: of the message), lots of the following:

# Procmail Entry for uci-www
:0 E
* ^TOuci-www
{
  :0 c
  uci-www/.

  :0
  |$MHONARC -rcfile $MHHOME/uci-www/0-rcfile.html -outdir $MHHOME/uci-www
}

Eric D. Friedman
friedman@uci.edu
... some text deleted ...

I use procmail to drive mhonarc archives from Majordomo.  I set up a
single pseudouser and drive several archives from the one pseudouser. 

Here's a sample .forward file:

"|/usr/ucb/rsh cappuccino \"set IFS=' '; exec
/usr/local/procmail/bin/procmail #widget\""

Another example is:

"|/bin/csh -c \"set IFS=' '; exec /usr/local/procmail/bin/procmail
#widget\""

Two reasons to use the "rsh cappuccino":
1. doesn't require the user to be able to login to server, although
   the username must still be valid
2. gets the processing load off the mail server

Here's an example .procmail recipe:

LOGFILE=$HOME/procmail_errors
LOGABSTRACT=all
LOCKEXT=.lock
VERBOSE=on
UMASK=003

# widget: list short description
:0 H
* ^List-Name: widget
{
  # The rotate call (under construction) does archive rotation
  # leave commented!
  #:0c i
  #| /home/web-arch/bin/rotate /usr/local/web/webarchive/widget

  # Put the mail in the mailbox, which is used by archiver to re-generate
  # the html indexes
  :0 cA
  /usr/local/web/webarchive/widget/current/mbox

  # The mhonarc call examines mbox, turns the mail messages into .html
  # documents, and compiles the indexes.
  # -reverse -treverse\
  :0 ia
  | /usr/local/mhonarc/bin/mhonarc \
    -idxfname index.shtml \
    -tidxfname threads.shtml \
    -rcfile widget.rc\
    -outdir /usr/local/web/webarchive/widget/current \
    /usr/local/web/webarchive/widget/current/mbox

}

I have a directory per archive, and put the current period in directory
"current".  Then I have an index page per archive that indexes the
periods, plus gives information about the list and how to
subscribe/unsubscribe.  The widget.rc file resides in the pseudouser's
home directory.

Note the 
* ^List-Name: widget
I put the following in the majordomo list's config file:

message_headers   <<  END
List-Name: widget
END

This adds the "List-Name" header to messages, which is what procmail
filters for.

Hope this helps

Paul McKinley
Unix SysAdmin Contractor

Does MHonArc support the "no archive" flag in messages?

Version 2.4, or later, does via the CHECKNOARCHIVE resource.

If using an earlier version, or if you are already doing some preprocessing, you can use a pre-processor like Procmail to do the filtering. Here is a message sent to the MHonArc mailing list:

> Subscribers who don't want their messages to be archived
> could add a "no archive" flag within their mail.

The most common way to do this is by checking for the existence
of an 'X-no-archive: yes' or 'Restrict: no-external-archive' header.

> As I'm invoking MHonArc through a procmail recipe I guess
> it's possible to do this within the recipe.

Very easy:

   # If people don't want to be archived, then remove their
   # message
   :0
   * ^(X-no-archive: yes|Restrict: no-external-archive)
   /dev/null

Chris

Is it safe to specify -add when no archive exists?

Yes. If MHonArc sees no archive exists when perform an add, it will automatically create the archive.

WARNING:

If using MHonArc versions 2.4, or earlier, make sure the file maillist.html (or the value of the IDXFNAME resource) does not exist if no archive exists and -add has been specified. Otherwise, unpredictable output of the maillist.html file may result if maillist.html is not in the proper format.


Why are there "jumps" in message numbers?

Big gaps in the message number sequence may occur if you defined the MAXSIZE resource and you have MHonArc rescanning a mail folder for adding new messages. The problem occurs when MHonArc reads in messages that will automatically get deleted due to MAXSIZE. Ie. Messages subject to automatic deletion are the oldest ones. If the input contains old messages that will get deleted at the end of processing, the old messages will still use up message numbers since messages to be deleted are not determined until all input is read. Since MHonArc does not keep information about deleted messages, if the messages are fed into MHonArc again, the "jumping" will occur again (and the jump will get larger for each additional update).

To avoid the problem, try to pass only new, never processed, messages to MHonArc instead of having MHonArc rescanning the same mail folder for new messages. Another approach is to set either the EXPIREAGE or EXPIREDATE resources (available in v2.0 beta 2, or later). These work as an alternative to MAXSIZE and will help in preventing message number jumping since expiration of a message is checked when it is initially read (bypassing the assignment of a message number).


Why do some messages get re-added each time MHonArc processes a mail folder?

This condition may occur when you have MHonArc examine the same folder periodically to add any new message. If there are messages in the folder without message-ids, then those messages will be re-added each time MHonArc runs.

Why? Well, MHonArc uses message-ids for determining if a message has been archived, or not. Therefore, if a message-id is missing for a message, then MHonArc believes it is new.

In general, mail has message-ids. They get assigned by MTAs. However, if messages are generated by a CGI program, or other non-mail specific software, then the program in question should create a message-id. Else, you will need to move already-processed messages into a different area so MHonArc does not read them again.

NOTE:

In MHonArc v2.4 and later, and if you have the Digest::MD5 module installed, MHonArc will compute the MD5 digest of message headers without message-ids. This allows MHonArc to skip the message in subsequent add operations.

A related problem is messages showing up again in the archive after you deleted them with RMM. MHonArc does not keep track of delete message-ids. Therefore, if want to make sure that a message will not appear in the archive after explicitly deleted via RMM, make sure to remove the message from input source.


How do I remove messages from an archive?

Automatic removal can be done via the EXPIREAGE or EXPIREDATE resources (available in v2.0 beta 2, or later) or the MAXSIZE resource.

Explicit message removal can be done with the RMM resource. Please read the RMM resource page for more information and examples.


Can I convert an archive back to mailbox format?

Anthony W developed a Perl program called mhn2mbox for converting archives back into mailbox format. A copy of the script is included in the contrib directory of the MHonArc distribution.


[Prev][TOC][Manual][Home][Next]


$Date: 2005/05/15 02:29:37 $
MHonArc
Copyright © 1997-2001, Earl Hood, mhonarc@mhonarc.org