Computer people like hierarchies. File system folders, HTML/XML/SGML, Java packages, class hierarchies, Usenet groups, the Windows registry, domain names, IP address assignment, software version numbers (1.1.5 comes before 1.10), GUI widget container hierarchies, URL paths, and hierarchal menus are some examples. Hierarchies are expressed using tree data structures, and trees are pretty cool. So we tend to want to use them when we see a new problem. It’s a structure taught to all computer scientists. It’s an old familiar friend.

Sometimes a tree isn’t the best structure for the problem. On the Internet you see some things that are not hierarchies, like email addresses (user@domain), the web (a directed graph structure), and IP routing tables. Sriram’s post got me thinking about overusing tree structures, and in particular, the A <i> b <b> c </i> d </b> e problem in HTML.

HTML uses a containment model: when you write <abc>xyz</abc>, xyz is contained “inside” the abc element. In a containment model, A <i> b <b> c </i> d </b> e is an error. In contrast, Emacs and XEmacs use an overlay model. Instead of a containment relationship, an overlay is attached to any span of text, and overlays can overlap. In an overlay model, A <i> b <b> c </i> d </b> e should be rendered c d e. There’s no problem. The text selection is also something that is hard to express in a tree structure. It’s just another overlay in Emacs.

But overlays don’t seem to be the right fit for larger structures, like paragraphs, sections, documents. For those containment makes more sense. Within a block however, overlays make sense. I think the difference may be between things that can be nested (a div can be inside another div) and things that can’t (it makes no sense to put an i inside another i), but I’m not really sure.

An argument can be made that supporting A <i> b <b> c </i> d </b> e properly at the expense of no longer having a uniform model (which is used by DOM, XSLT, etc.) isn’t worth it, but that argument should be made explicitly. Every use of a tree structure should be justified, because there is a natural bias towards trees among computer folk.

0 comments: