class MlmmjArchiver::Archiver

Archiver class. Point it to a target directory you want to place your web archive under, add some MLs to process and start the process via archive!. You have some influence over the used (temporary) MHonArc RC file by specifying some arguments to ::new.

Note that archiving for the web is a two-step process. First the mails in mlmmj’s archive folder need to be split up in a directory structure that allows processesing them month-by-month instead of processing them all at once, because this allows for an easier overview of the web archive. In the second step, all these month directories are passed into mhonarc, which converts them to HTML and stores them in the final directory.

Constants

ARCHIVE_DIR

Path relative to ML root containing the mails

CONTROL_FILE

Path relative to ML root containing the file that requests the web archiving.

MHONARC

Path to the mhonarc executable.

MRC_DEFAULTS

Default values for the MHonArc RC file.

MRC_TEMPLATE

Template for generating the temporary MHonArc RC file.

Public Class Methods

new(target, rc_args = {}) click to toggle source

Create a new Archiver that stores its HTML mails below the given target directory. rc_args allows the customization of the used MHonArc RC file. It is a hash that takes the following arguments (the values in parentheses denote the default values)

header (“<p>ML archive</p>”)

HTML header to prepend to every page. $IDXTITLE$ is replaced by the title of the respective index.

tlevels (8)

Number of levels to nest threads before flattening.

archiveadmin (postmaster@example.org)

E-Mail address of the archive administrator.

checknoarchive (true)

If set, adds <CHECKNOARCHIVE> to the rc file. Otherwise adds <NOCHECKNOARCHIVE>.

searchtarget (nil)

If this is set, displays a link called “search” next to the index links that links to the location specified here.

stylefile (“/archive.css”)

CSS style file to reference from the outputted HTML pages.

mhonarc (“/usr/bin/mhonarc”)

Path to the mhonarc executable to create the archive.

cachedir (nil)

Path to a directory where the mails are stored sorted. Setting this to a permanent storage will speed up the archiving process on large MLs.

# File lib/mlmmj-archiver/archiver.rb, line 77
def initialize(target, rc_args = {})
  @target_dir   = Pathname.new(target).expand_path
  @mailinglists = []
  @mutex        = Mutex.new
  @rc_args      = MRC_DEFAULTS.merge(rc_args)
  @debug        = false
  @inotify_thread = nil
  @mhonarc      = rc_args[:mhonarc] || MHONARC

  if rc_args[:cachedir]
    @sorted_target = Pathname.new(rc_args[:cachedir]).expand_path
  else
    @sorted_target = Pathname.new(Dir.mktmpdir)
    at_exit{FileUtils.rm_rf(@sorted_target)}
  end

end

Public Instance Methods

<<(path) click to toggle source

Like add_ml, but returns self for method chaining.

# File lib/mlmmj-archiver/archiver.rb, line 114
def <<(path)
  add_ml(path)
  self
end
add_ml(path) click to toggle source

Add a mlmmj ML directory to process.

# File lib/mlmmj-archiver/archiver.rb, line 106
def add_ml(path)
  dir = Pathname.new(path).expand_path
  debug("Adding ML directory: #{dir}")

  @mailinglists.push(dir)
end
archive!() click to toggle source

Process all the mails in all the directories.

# File lib/mlmmj-archiver/archiver.rb, line 169
def archive!
  @mutex.synchronize do
    rcpath = generate_rcfile

    @mailinglists.each do |path|
      control_file = path + CONTROL_FILE
      next unless control_file.file?

      process_ml(@sorted_target + path.basename, @target_dir + path.basename, rcpath)
    end
  end
end
debug_mode=(val) click to toggle source

Enable/disable debugging output.

# File lib/mlmmj-archiver/archiver.rb, line 96
def debug_mode=(val)
  @debug = val
end
debug_mode?() click to toggle source

True if debugging output is enabled, see debug_mode=.

# File lib/mlmmj-archiver/archiver.rb, line 101
def debug_mode?
  @debug
end
preprocess_mlmmj_mails!() click to toggle source

Iterates over all mailinglists and copies new messages into the intermediate month directory structure.

# File lib/mlmmj-archiver/archiver.rb, line 157
def preprocess_mlmmj_mails!
  @sorted_target.mkpath unless @sorted_target.directory?

  @mutex.synchronize do
    @mailinglists.each do |path|
      hsh = collect_messages(path + ARCHIVE_DIR)
      split_messages_into_month_dirs(hsh, @sorted_target + path.basename) # path.basename is the ML name
    end
  end
end
stop_watching_mlmmj_mails!() click to toggle source

Terminate the watching thread started by watch_mlmmj_mails.

# File lib/mlmmj-archiver/archiver.rb, line 151
def stop_watching_mlmmj_mails!
  @inotify_thread.terminate
end
watch_mlmmj_mails!() click to toggle source

The more elegant variant of preprocess_mlmmj_mails. Instead of polling all mails and testing whether they are there, use inotify to have Linux notify us when a new file is added to the ML directory. For this method to work rb-inotify must be available on your system (otherwise you get a NotImplementedError).

# File lib/mlmmj-archiver/archiver.rb, line 124
def watch_mlmmj_mails!
  raise(NotImplementedError, "This is only possible with rb-inotify!") unless defined?(INotify)

  @inotifier = INotify::Notifier.new

  @mailinglists.each do |path|
    archive_dir = path + ARCHIVE_DIR

    @inotifier.watch(archive_dir.to_s, :create) do |event|
      next unless File.file?(event.absolute_name)
      next unless event.name =~ /^\d+$/

      debug "Got a new mail: #{event.name}"
      sleep 2 # Wait for the file to be fully written

      @mutex.synchronize do
        mail = Mail.read(event.absolute_name)
        FileUtils.cp(event.absolute_name, @sorted_target + path.basename + mail.date.year.to_s + mail.date.month.to_s)
      end
    end
  end

  debug "Watching MLs via inotify."
  @inotify_thread = Thread.new{@inotifier.run}
end