Monday, September 25, 2006

In my last post, I described the new implementation of my web pages and hinted that there was a new design as well. You can compare the old design to the new design to see:

  • There's a lot less color. This is probably the biggest change. I like bold colors, but it was too much; it overwhelmed the page and made the headers (which also used the same color) harder to pick out. Section headers are now black (with just a small square of color); the page border is now subdued.
  • The text is more tightly packed. (I changed the leading from 1.5 to 1.25.) I have mixed feelings about this; I may change it back.
  • Text matches the line spacing of the main text as much as possible. I'm a little skeptical of this rule, as it seems to result from limitations in printing presses, but a friend of mine who really knows his typography recommends it.
  • There's a print-friendly version of the style using @media screen style rules to enclose sections designed for regular screens and @media print style rules to enclose sections designed for printing. The navigation bar, sans-serif body font, shadows, and colored backgrounds are removed for printing. I didn't write any rules for other media types.
  • There's more whitespace on the left, except to the left of headings.* This makes headings a little easier to pick out.
  • The page contents have a soft shadow around them; preformatted text has a soft shadow inside.*

The items with a * are enabled by the implementation; they were not something I could easily do with CSS alone. Each section is now a container with both a title and contents:

<x:section title="Header">
    Paragraph 1.
    Paragraph 2.

In HTML, the sections are implicit—they are whatever occurs between headers. In the my XML content, the sections are explicit. That way I can apply a style to the section, such as adding a margin to the left. This is something I've long wanted through various redesigns. Sometimes I want a border; sometimes I want a horizontal line separating sections; sometimes I want them to be a different color. Now that I have each section marked, I can choose a combination of HTML tags and CSS rules to achieve the effect I want.

I treated the document in the same way as the sections. In HTML, the header and footer are inside the document body. This means any style rules that apply to the body contents (such as margins) also apply to the header and footer. I often want the header and footer to span the entire width of the document, so this makes the HTML messier. With XSLT, I can create a new document body (with <div>) and move the header and footer outside of it. To get the shadow effect, I use three divs, each with a different border color. Preformatted text gets surrounded by three divs as well. XSLT lets me inject new elements that are used purely for formatting.

I tried using serifs for the body font but it was too much of a change for me, and it doesn't work as well on lower dpi screens. There are still a few things to clean up, including the rendering in Internet Explorer, but overall I'm pretty happy with the style.


Sunday, September 24, 2006

As long as I've been writing web pages, I've experimented with ways to manage them. Long ago I used the C preprocessor to give me server-side includes and simple macros. It was a mess, since the C preprocessor parses single quotes and double slashes (for C character constants and C++ comments), and both of those occur in other contexts in web pages. Later on I built something that could automatically build navigation trees for each page. I also experimented with but never fully adopted a system that would let me write HTML in a simpler syntax. I built tools that would take multiple text files and assemble them into a larger page. I also tried using third party tools, like LaTeX2HTML. I eventually abandoned all of these systems. They were too complex and introduced dependencies between input files, and I then had to manage those dependencies.

The last time I had the web page management itch, I wrote down what I wanted out of any new system I set up:

  • Portable. I want to depend on as few external tools as possible, so that I could run this on a variety of systems, including Windows, Linux, Mac, and the restricted environment where my web pages are hosted.
  • Straightforward. I had abandoned several of my previous systems because they imposed restrictions on what I could put into my document. I'm comfortable with HTML, and I'd like to just write HTML as much as possible. The more abstractions I put in between my system and the final output, the more restrictive and complex it will be.
  • Fast. Several of my previous systems had to analyze all of my documents in order to rebuild any of them. In particular, navigation trees require analyzing other nodes in order to create the links. If I only edit one document, I want my system to regenerate only one HTML file. Therefore this rule requires that I do not put in navigation trees.
  • Simple. I'm lazy. I don't want to write a complicated system to manage my pages. I just want to get the low hanging fruit and not worry about solving all the problems.
  • Static. I have to produce static HTML; I don't have a web host where I can run web apps or CGI scripts.

I ended up learning and using XSLT. I do not particularly like XSLT, but it's a reasonable tool to start with. When HTML and XML documents are viewed as trees, XSLT is used to transform (rearrange, erase, and add) tree nodes.

The first step was to extract the content out of my existing web pages. Each page has a mix of template and content. The extrator separated them. I had written these pages in different styles over a period of ten years, so the headers, navigation, HTML style, etc. are not consistent. Some of the pages were generated by other tools I had used. While looking at the old HTML, I decided that I would not be able to treat them uniformly. I added to my requirements:

  • Optional. Some pages will use the new system and some pages will not. I will not migrate everything at once.

I wrote XSLT to extract content out of some of my web pages. I grouped the pages by their implementation and style. I handled the extraction in four ways:

  1. If the page already fit my extractor, I left it alone.
  2. If the page with some minor changes would fit my extractor, I made those changes to the page.
  3. If the page would fit the extractor with minor changes, I made those changes to the extractor.
  4. If the page would require major changes to either it or the extractor, I excluded it.

I thus had a set of pages (some modified), a set of pages to exclude, and an extractor. The extracted content was in the form of XML files (which were mostly HTML, with some XML meta information.) I tested them by inspection; it was hard to tell whether I got things right until the second step: injection. In the second step, I combined the content of each page and a new template I had written, producing HTML. I then compared the new pages with the old pages, side by side, in several browsers, until I was reasonably happy with the results. I rapidly iterated, each time fixing the extactor, the injector, or the old HTML (so that it'd extract better). Any pages I couldn't fix at this step I wrote down for later fixing.

To summarize, during this stage of development I had old pages, an extractor, a set of content pages (the results of the extractor), an injector, and new pages (the results of the injector). Some pages I had excluded from the entire process, and others I had marked as needing repair. I also had a set of things I'd like to do that XSLT couldn't handle.

After testing the output extensively, I was finally ready to make the switch. Tonight I replaced the old pages with the new ones. I no longer run the extractor. This means the content pages are no longer being overwritten, so I can now edit those pages instead of the old web pages. I went through the list of pages that needed repairs and fixed them. I tested every page on the new site, fixed up a few minor leftover problems, and pushed the pages to the live site.

I'm much happier with the new system. It's not a series of hacks and it's not custom code. It's using XML and a very simple shell script. It runs on Windows, Mac, and Linux. There is still more to do though. Not all of my pages use the new system or my new style sheet. Some parts of my site, including the blog, will continue to use an external content management system, so I will apply my new stylesheet and template without using the XML injector. There are a few more minor features that I want to implement, and there are more pages to clean up. I'm not in a rush to do any of this; it'll probably take a month or two. I've also been reluctant to edit my pages until now, because any changes I made had to be duplicated on the development pages (which I had modified to make the extractor work). Now that the new pages are up, I can resume working on the content of my pages.

I do recommend that people look into XSLT, but I think it's not sufficient for most needs. It does handle a large set of simple cases though; I'll fill in the gaps with some Python or Ruby scripts. If you find things on my site that don't work properly, let me know; I'm sure there are bugs remaining.


Friday, September 15, 2006

In the age of the search engine, my naming advice: make sure people can search for your product. That means don't use a common word (you have to compete with all the pages already using it): Word, Excel, Windows, Apple, Office, Backpack, Ask, Live. Don't use a misspelled common word (users, upon hearing the name, will not know how to spell your variant): Novell, Digg, Topix, Froogle. Don't use capitalization or punctuation to stand out (search engines and search engine users often ignore capitalization and punctuation): C#, .NET, Don't use names that aren't even in Unicode (the artist formerly known as "the artist formerly known as Prince"). It's okay but not great to use a common word that isn't commonly used on the web: Amazon, Dell, Ta-Da, Macintosh, Mac, Safari, Basecamp. It is fine to use misspellings of uncommon words, but you need to make sure people learn to use your spelling: Google, Flickr. It's better to use two easily spelled words mashed together: PlayStation, MicroSoft, WordPerfect, SketchUp, HotMail, FireFox; or to attach a short distinguishing mark to a common word: eBay, 43Things, GMail, 3Com, iPod, XBox, 30Boxes, 37Signals. If you're going to come up with a brand new word, make sure it's easy to spell once you hear it: Netscape, Akamai, Comcast, Honda, Lego. Don't use a name that's a subset of another name unless the product is a variant of the shorter name: Mac Pro, MacBook, MacBook Pro.

Don't pick a name like Zune unless you have enough marketing muscle to teach everyone that it's Zune, not Zoon. Try searching for zoon and see what you get. As of this post, the first page of search results doesn't mention Microsoft Zune at all. Every search for zoon is a lost chance to get a customer.

Many of the above names succeeded despite being bad because they got started before the web became big. But if you're starting a brand new product and call it "Word", I think you'll be shooting yourself in the foot. If you make it hard for people to find you, you'll have fewer customers and you'll make it hard to get the network effects that would bring in even more customers.

It would be interesting to see if there's a correlation between names and popularity, but it would probably be impossible to study this without examining parallel universes.