Archiver class. Point it to a target directory you want to place your web archive under, add some MLs to process and start the process via archive!. You have some influence over the used (temporary) MHonArc RC file by specifying some arguments to ::new.
Note that archiving for the web is a two-step process. First the mails in
mlmmj’s archive
folder need to be split up in a directory
structure that allows processesing them month-by-month instead of
processing them all at once, because this allows for an easier overview of
the web archive. In the second step, all these month directories are passed
into mhonarc
, which converts them to HTML and stores them in
the final directory.
Path relative to ML root containing the mails
Path relative to ML root containing the file that requests the web archiving.
Path to the mhonarc
executable.
Default values for the MHonArc RC file.
Template for generating the temporary MHonArc RC file.
Create a new Archiver that stores its HTML
mails below the given target
directory. rc_args
allows the customization of the used MHonArc RC file. It is a hash that
takes the following arguments (the values in parentheses denote the default
values)
HTML header to prepend to every page. $IDXTITLE$ is replaced by the title of the respective index.
Number of levels to nest threads before flattening.
E-Mail address of the archive administrator.
If set, adds <CHECKNOARCHIVE> to the rc file. Otherwise adds <NOCHECKNOARCHIVE>.
If this is set, displays a link called “search” next to the index links that links to the location specified here.
CSS style file to reference from the outputted HTML pages.
Path to the mhonarc
executable to create the archive.
Path to a directory where the mails are stored sorted. Setting this to a permanent storage will speed up the archiving process on large MLs.
# File lib/mlmmj-archiver/archiver.rb, line 77 def initialize(target, rc_args = {}) @target_dir = Pathname.new(target).expand_path @mailinglists = [] @mutex = Mutex.new @rc_args = MRC_DEFAULTS.merge(rc_args) @debug = false @inotify_thread = nil @mhonarc = rc_args[:mhonarc] || MHONARC if rc_args[:cachedir] @sorted_target = Pathname.new(rc_args[:cachedir]).expand_path else @sorted_target = Pathname.new(Dir.mktmpdir) at_exit{FileUtils.rm_rf(@sorted_target)} end end
Like add_ml, but returns
self
for method chaining.
# File lib/mlmmj-archiver/archiver.rb, line 114 def <<(path) add_ml(path) self end
Add a mlmmj ML directory to process.
# File lib/mlmmj-archiver/archiver.rb, line 106 def add_ml(path) dir = Pathname.new(path).expand_path debug("Adding ML directory: #{dir}") @mailinglists.push(dir) end
Process all the mails in all the directories.
# File lib/mlmmj-archiver/archiver.rb, line 169 def archive! @mutex.synchronize do rcpath = generate_rcfile @mailinglists.each do |path| control_file = path + CONTROL_FILE next unless control_file.file? process_ml(@sorted_target + path.basename, @target_dir + path.basename, rcpath) end end end
Enable/disable debugging output.
# File lib/mlmmj-archiver/archiver.rb, line 96 def debug_mode=(val) @debug = val end
True if debugging output is enabled, see debug_mode=.
# File lib/mlmmj-archiver/archiver.rb, line 101 def debug_mode? @debug end
Iterates over all mailinglists and copies new messages into the intermediate month directory structure.
# File lib/mlmmj-archiver/archiver.rb, line 157 def preprocess_mlmmj_mails! @sorted_target.mkpath unless @sorted_target.directory? @mutex.synchronize do @mailinglists.each do |path| hsh = collect_messages(path + ARCHIVE_DIR) split_messages_into_month_dirs(hsh, @sorted_target + path.basename) # path.basename is the ML name end end end
Search the given mailinglist for a specific search term. Return value is an
array of paths relative to the HTML directory of the given ML.
query
may be a regular expression or simply a string to check
for.
# File lib/mlmmj-archiver/archiver.rb, line 186 def search(mlname, query) html_dir = @target_dir + mlname results = [] html_dir.find do |path| next unless path.file? next unless path.basename.to_s =~ /^\d+\.html$/ # Check if the file content matches content = File.read(path) if query.kind_of?(Regexp) result = content =~ query else result = content.downcase.include?(query.downcase) end # If it did, remember it for returning results << path.relative_path_from(html_dir) if result end results end
Terminate the watching thread started by watch_mlmmj_mails.
# File lib/mlmmj-archiver/archiver.rb, line 151 def stop_watching_mlmmj_mails! @inotify_thread.terminate end
The more elegant variant of preprocess_mlmmj_mails. Instead of polling all
mails and testing whether they are there, use inotify to have Linux notify
us when a new file is added to the ML directory. For this method to work
rb-inotify
must be available on your system (otherwise you get
a NotImplementedError).
# File lib/mlmmj-archiver/archiver.rb, line 124 def watch_mlmmj_mails! raise(NotImplementedError, "This is only possible with rb-inotify!") unless defined?(INotify) @inotifier = INotify::Notifier.new @mailinglists.each do |path| archive_dir = path + ARCHIVE_DIR @inotifier.watch(archive_dir.to_s, :create) do |event| next unless File.file?(event.absolute_name) next unless event.name =~ /^\d+$/ debug "Got a new mail: #{event.name}" sleep 2 # Wait for the file to be fully written @mutex.synchronize do mail = Mail.read(event.absolute_name) FileUtils.cp(event.absolute_name, @sorted_target + path.basename + mail.date.year.to_s + mail.date.month.to_s) end end end debug "Watching MLs via inotify." @inotify_thread = Thread.new{@inotifier.run} end