Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 5 Feb 2012 17:47:49 +0200
From: Lauri Kokkonen <lauri.u.kokkonen@...il.com>
To: owl-dev@...ts.openwall.com
Subject: Re: blists status?

On Thu, Feb 02, 2012 at 07:42:12AM +0400, Solar Designer wrote:
> blists is not (yet) part of Owl, so this is slightly off-topic for this
> mailing list.  That said, I don't mind discussing it in here briefly
> before we decide on next steps.  And we might in fact include blists
> into Owl after we get a web server into Owl base tree (there's almost no
> point in having blists without a web server).
> 
> It is impressive how you analyzed blists before bringing this topic up.
> Are you using blists somewhere?

blists is small (around 3500 LOC) and well-defined set of programs so for
a coder it is quite straightforward to get a basic understanding of its
inner workings.

I do, or did (before it had index pages with Subjects and Froms) run blists
on a server that mostly hosts services for a couple friends doing hobbyist
embedded projects etc. But nowadays we use wiki for most of our "important"
discussions.

I quite like the "indexed mailbox" design of blists, that is, generating
HTML pages on the fly instead of duplicating the mailbox like (all?) the
major open source programs like MHonArc and Hypermail seem to do.

> [...]

> > For new features I would personally like to see threaded view.
> 
> Yes, it's one of the desirable features.
> 
> > This should be possible to add without changes to the index format.
> 
> Currently, the threads are "flattened" - each message only has one
> thread-prev and one thread-next message - that's in the index file.
> (Yes, this means that thread-prev is not necessarily the message that
> the current message is a direct reply to.  It may also be a message
> posted to the same thread after the replied-to message was posted.)
> 
> So for a flat threaded view, everything is easy (and this may be an
> initial development milestone).
> 
> For a tree threaded view, we may either scan the flattened thread and
> match instances of msgid_hash vs. irt_hash (in bit) or enhance the index
> format and record more-direct pointers up and down in the tree in the
> index.  I currently think that the former approach is preferable, but
> this may need further consideration.

Ah. I have studied the index format and the code that flattens threads
more thoroughly now. It seems that once you have the head of a thread
and reversed irt pointers, you basically have the tree, and then it is
only a matter of recursively traversing it to get a tree threaded view.

Storing reversed irt pointers to the index is probably not possible
because then it would be difficult to keep the index elements fixed
size. Storing plain irt pointers is possible and this would lessen the
processing bit has to do, but I'm not sure if that is worth it.

I guess there might be even more options for enhancing the index format.

> > Another often seen
> > feature is author view but I am not so sure about it; do you want to
> > build a rich set of features or go with a compact piece of software?
> 
> We need to strike a balance.  Besides code size, also important are code
> complexity and potential security risks.  Merely adding a new display
> mode is likely low-risk, whereas e.g. adding a search feature is higher
> risk (extra user input).  That said, author view was not requested so
> far, so it is not on the agenda, whereas search was requested, so it is.
> Also, maybe having "advanced search" would eliminate the need for author
> view (if there's such need at all).
> 
> > A search feature would be wonderful. Probably some kind of search index that
> > links the search term space to individual messages has to be built for
> > this. There are many different options on how to do this, and some problems
> 
> Right.  abc looked into using Xapian for this - in fact, he experimented
> with it briefly in a modified blists tree.  He would probably proceed to
> implement it for real by now, but we agreed on other priorities
> initially - such as index pages with Subjects and Froms (already in
> blists 1.0), "Recent messages" lists (in CVS), support for different
> character encodings (in CVS for headers, not yet implemented for bodies).
> 
> > like the order of mbox index entries possibly changing (if I have understood
> > correctly),
> 
> Why would it change?  Are you possibly referring to the qsort() call?
> If so, that is something abc ran into too and we discussed ways around
> it - but I forgot the details (may need to dig up chat logs).
> 
> > so that it probably needs a few candidate implementations and
> > serious testing.
> 
> Definitely.  And frankly, I am concerned about running a version of bit
> linked against Xapian or whatever and passing untrusted user input from
> the web into that library...  I am not sure how to deal with this best.

Okay. I would like to be able to search messages by author, by subject
and by message body.

This is my initial idea for building a search index from scratch: For
each word store pointers to all messages that contain that word and
encode these pointers as ranges of messages in time or in thread. So
store the first message and the number of following messages that also
contain that word.

For an example, searches like the following would be quite
straightforward:
  word1 AND word2 AND NOT word3

For searches like "word1 word2" (paragraph) we would need to do:
  word1 AND word2
and then fetch the message bodies to check if the words are next to
each other.

But this would certainly need a lot more time, consideration, testing,
statistics and so on to become something useful.

> > Do you have other TODO items for blists?
> 
> Yes.  I have a cryptic TODO list originally intended for myself only
> (before abc got involved in blists development).  To name a few items
> (not yet mentioned above): support for downloads of attached files, RSS
> or Atom feed, highlight search keywords based on Referer (such as from
> Google search), and many smaller enhancements.

Sounds good.

So, I could start by doing something small first, like writing comments
before few functions in mailbox.c, then proceed to write a flat threaded
view and maybe remove duplicate code in html.c if needed.

Lauri

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.