This document attempts to address stylistic points of HTML composition, both at the document and the web level. It is available on the Web at http://www.ology.org/tilt/cgh/ (if you are reading this via a mirror, you may want to check the original to make sure you're seeing an up-to-date version).
Disclaimer: This document is neither finished, nor is it even all that current. It reflects my thinking during the summer of 1995, which is that last time that I had the luxury of spending three months thinking about these kinds of things. The web changes rapidly; in some ways this document is now hopelessly out of date. In other ways, though, it still retains some valuable insights (at least, in my humble opinion). You'll have to be the judge, as the reader, because I have neither the time nor the inclination these days to try to keep this document updated. This text is still largely the same text I wrote that summer, with occasional grammatical and spelling fixes. Caveat emptor!
New: This is version 2.0.20 (the old version 1 is no longer available, as of Nov 17 1997). Now that Web Weaving is on shelves near you, it seems appropriate for me to get off my duff and feed all of the changes back into this document. See "Some History," below, for more information on what the heck I'm talking about.
This document is divided into two main sections. The first section discusses the document -- it should be recognizable as the revised version of the original CGH. It discusses good practices to follow in creating your documents, common errors and things to avoid when composing HTML, and finally, a brief treatment style sheets, which provide a mechanism for greater control over how a document is rendered. The second section is brand new -- it discusses style issues regarding your Web as a whole. How it is divided and organized, how it is interlinked and intertwined; these are the issues under consideration here.
This is not a beginner's guide; check the "For More Information" section for pointers to more basic works, as well as for more advanced references and tutorials. It is designed for the HTML author who has learned the basics, and is ready to start thinking about the more advanced aspects of Web document design.
Note: I'm not finished spiffing up this new version yet, but it's good enough to be presentable, and I'd rather have the information available, rather than have it languish for lack of final polishing. At the very least, I still need to:
Unfortunately, the life of grad student is not all cheese and wine (very little of it, in fact), so these will have to come at a later date. Besides, with the publication of Web Weaving (see the History section below for background), it seems an appropriate time to also re-update this document, so I won't let a little thing like a busy schedule stand in my way.
I wrote the first version of "Composing Good HTML" in January of 1994. At this point, the Web was just starting to explode, and Mosaic was the browser on the tip of everyone's mouse. Being one of the strange few who used Lynx as well as Mosaic (as well as Emacs-W3, when I was feeling cocky), I noticed that different browsers dealt with incorrect usage of HTML with varying degrees of success. When I pointed this out, the solution suggested to me was to write a "lint" for HTML that would point out common errors in documents. In preparation for this, I started making a list of common errors, and turned that list into a human-readable document. That document became "Composing Good HTML."
About that time the semester started, so I made the document publicly available, and asked for comments and criticism. I got both, in spades! I corrected errors (including a plethora of spelling and grammatical errors), added some new sections, and revised pieces of existing sections. But, all in all, CGH didn't really change much, even though things like Netscape and HTML 3.0 (let alone Java and VRML) have snuck up in the meantime.
In January of 1995, Carl Steadman, Tyler Jones, and I got together with the idea of writing a book about the Web (this was before the current explosion of the market, so you'll pardon our naivete). Rather than writing a book about HTML, we decided to write a book about creating and maintaining an entire site -- including the stylistic points in CGH as a starting point. The book is called Web Weaving, and it appeared on bookshelves on December 18th, 1995. The book is published by Addison-Wesley.
The side effect of all of this is that it gave me a reason to revise CGH to reflect current practices for inclusion in Web Weaving. And now that we've finally finished our book, this also means that the changes in CGH are getting fed right back into the online version. Which, I'm proud to say, is still freely available (and better than ever, I'd like to think). What you see here is, by and large, Chapters 11 and 12 from Web Weaving, edited so that they stand alone better. While I'd certainly recommend you read Web Weaving for a full treatment on all the issues involved in building and maintaining your Web site (and because every author hopes that his words will be read), Composing Good HTML remains (I hope!) a useful resource for HTML authors (and now Web designers) who want a slightly more sophisticated treatment of the stylistic issues involved in, well, weaving your web.
I never did get around to writing that "lint" program, though.
The World Wide Web has been a wildly successful experiment. It has filled a need for both information users and for information providers: a tool which allows information to be deployed to a wide variety of people over wide geographic distances, regardless of what kind of computer they may be running. All that is required to publish information is any one of a number of Web servers, and all that is required to view that information is any one of a number of Web clients. This is both an opportunity and a challenge. This document discusses the ways in which you construct your markup so that it is readable and usable for a wide range of browsers.
HTML provides a device-independent way of describing information. The elements of HTML describe what your information is, not how it should be displayed. This is a subtle point, and perhaps the most important one presented here. HTML will let you describe this piece of information as a header, or that piece of information as an address. It will not let you describe this text as being in 24-point Helvetica, right justified. Your challenge is to provide professional page layout and design without using the traditional tools of professional page layout and design. Sound like a paradox? Not really. All it involves is a bit of trust.
The trust you must have can be summarized by the following rule:
With the current diversity of clients for the Web (and we can only expect to see more), it has become important to write HTML that will look good on any client, and not just on the specific client which the author may have access to. You must trust your markup. There is no way to anticipate how every browser will (differently) render your HTML. If you follow this rule you will get the best possible rendering with all browsers, instead of for just one browser.
To this end, there are a few solutions. One approach is software based -- a "lint"-like program for catching semantic errors in HTML, and perhaps even correcting them. Two good examples of this are the W3C HTML Validation Service and WebLint. Another approach is the one taken by this document -- a style guide which points out common errors one might make in the composition of HTML, and recommending good practices to follow.
Bear in mind when following these guidelines that your document may not end up looking the best it possibly can on a particular browser. However, it also will not look ugly on any browser, which is the risk you take by disregarding these recommendations and tweaking your markup code for, say, Netscape. Unfortunately, Netscape may render things differently from Lynx which may render things differently from Mosaic, and so on and so forth -- and even within a particular browser, a user may have chosen font or style preferences different from the ones which you might assume. What these guidelines should do, if followed, is make for a better presentation for the most browsers (instead of the best presentation for only one) -- and ensure that your documents reach the widest audience possible.
Things contained in this section are good practices for the generation of any HTML document. Specifically, this would include anything which should routinely be done in the creation of documents for the benefit of both reader and author.
There are at least three major flavors of HTML currently in practice as this is being written: HTML 2.0, HTML 3.0, and the Netscape extensions to HTML 2.0. HTML 2.0 is the closest thing to current practice that is available, and can be assumed to be "safe" for all browsers.
On the other hand, the HTML 3.0 and the Netscape extensions are not widely implemented, let alone standardized. Under most circumstances, this would be a good reason not to use them until they were more widely available, but there is the mitigating circumstance that all of the Netscape extensions (and some of HTML 3.0, most notably tables) are supported by one of the most popular Web browsers ... Netscape!
What should be done about this? Many Web authors take the approach that, since most people use Netscape, it's acceptable to use the Netscape elements, even if it is to the detriment of people using other browsers. Others take the approach that nothing more than HTML 2.0 should ever be used, which means that any benefit which might be derived from these enhancements is lost.
The best road is a middle approach. Two good rules of thumb are:
FONT
element changes the font size
of text in the Netscape Navigator, but not some other clients (when I
first wrote this, no other client supported this. These days, several
others do as well, including IBM WebExplorer and Microsoft Internet
Explorer). However, other clients will simply ignore tags they
do not understand-so the text in the FONT
element will
still be readable. On the other hand, if the MATH
element
is ignored by a browser, the browser will display gibberish.
In general, try to think about the effect that the non-standard
elements will have if they are not recognized. These elements
can be used intelligently, and on browsers that recognize them,
can dramatically enhance the presentation of your page. If it
is not possible to use the elements in such a way that rendering
is still good on all clients, think about providing multiple copies
of the document (for instance, providing a version of the table
using the PRE
element), and possibly using content-negotiation
on the server to provide the reader with the correct version of
the document.
A final thought on the subject: try to avoid banners in your document that claim that your document is "Enhanced for Netscape" or "Enhanced for HTML 3.0" (or the rapidly more prevalent "Enhanced for Microsoft's Internet Explorer." Ugh.) Rather, try to build your document so that if a reader reads it in (for example) Netscape, it will be obvious that it uses the new elements to good effect ... and if a reader reads it in another browser, they can remain blissfully unaware of what they cannot see, and still be impressed by what they do see.
One problem which faces anyone trying to find information using the Internet is the question of "authoritativeness." The relative ease with which WWW servers can be set up and populated with information means that the traditional checks of the publishing process can not act to filter out information which is inaccurate or misleading. In addition, it can often be hard to tell how current information found online is, or how actively it is maintained and updated.
One thing which you can do to assist Web users is to sign and date all documents in your infostructure, so that people viewing the documents can form some impression of the authority of the document (i.e., how recent it is, and how reliable the information provider is). This is not a complete solution, but it is a large step forward.
For example:
<HR> Last modified: March 6, 1995 <ADDRESS> <A HREF="http://cs.cmu.edu/~tilt/">James Eric Tilton</A><BR> <A HREF="mailto:tele@ology.org">tele@ology.org</A> </ADDRESS>
Some notes about this example:
mailto:
link anchors the document to the mail
address of its creator. The mailto:
URL specifies
an e-mail address. Most browsers support this, allowing the reader
to send e-mail to the address specified. This can be a useful
way to get feedback. In addition, the mailto:
link
is separated from the link to the home page by a <BR>
,
so that the two links can be easily distinguished.
Another option for signing a document is to encode information
about the author in the document's header information. You can do this
by including a LINK
element of type made
in your HEAD
element. For example:
<HEAD> <TITLE>This is my Title</TITLE> <LINK REV="made" HREF="mailto:author@some.site.org"> </HEAD>
This example uses the LINK
element, which may be
unfamiliar to you. This element is equivalent to the A
element; that is, it provides a link to some other object. However,
since it is part of the HEAD
information (which is
information about the document, rather than part of the
document itself), this is a link from the entire
document to another object. (Anchors, on the other hand, are links
from some small subset of the document, like a word or a phrase,
to another document). This link, like most other HEAD
information, is typically not displayed by a browser, or followable
by a reader.
The fact that it is not displayed does not make it useless,
however. Many browsers, such as Lynx,
supply a "reply to author" function. The information about who the
author is comes from using the LINK
as above. Other
applications which can make use of the information include Web spiders
and other maintenance tools, which can benefit from having authority
information in machine readable format.
The format of the LINK
element is the same as that
of the A
element. Notice the use of the REV
attribute, which describes this relationship as a REV
erse
relationship of the type made
. This means that this
document was made by the object at the other end of the anchor.
One promise of the wide-spread availability of personal computers has been the lessening of our reliance on paper. In some ways, this promise has been realized; many trees (and municipal landfills) are no doubt grateful that many of us are now committing our words to e-mail instead of to a handwritten or typewritten letter or memo. On the other hand, until video display technology produces results indistinguishable from paper, we will no doubt continue to print out things. It's hard to curl up with a notebook at night, especially if it has a coaxial cable jutting out the back of it. Because of this, many people will want to print out the documents which you have provided electronically. In effect, they will want to take the document you have woven into a part of a web, and make it into a standalone document.
Fortunately, HTML is well-suited to this. A document in HTML can theoretically be rendered in many more formats besides simply on a screen. Print is one obvious alternative, although speech and Braille are also possible and desirable. We bring this up because it is important to consider ways other than on-screen that a reader may encounter your documents. Given that, thinking about your document as something that might be printed can be a very useful tool for creating documents that aren't tied to the specific requirements of a browser or display hardware.
One of the advantages of the World Wide Web over similar infosystems, like Gopher, is that the Web makes no distinction between what is a menu and what is a document. For instance, in Gopher, a document is "dead" -- it can't lead anywhere, and, in order to continue exploration, a reader must return one step back to a menu. In the same vein, a Gopher menu provides only limited information about where links lead to: often a menu item must be retrieved and explored before any sense can be made of whether it is appropriate to what a reader seeks.
On the other hand, a Web document is "live" -- there's no clear dividing line between a menu container and its contents. This is a liberating distinction, as a document can now be as verbose as necessary in providing context for links. Consider the difference between these two documents in Figures 1 and 2.
[Figure 1: A menu list without context (Lynx)]
[Figure 2: A prose description of resources (Lynx)]
The second example is much more satisfying, because it is more than simply a list of pointers. Instead, an effort has been made to integrate the list into prose that is (presumably) better tied into the subject of the document as a whole.
This is not to say that it is always preferable to force what is more naturally a menu into prose for the sake of prose. If you are creating a document that serves as an jumping-off point to other resources, your readers might not want to get into the thick of text to find the resource they're searching for. In this case, a definition list may be more appropriate, as shown in Figure 3. This is a nice compromise, giving context without becoming buried in a forest of words.
[Figure 3: A menu using a definition list (Lynx)]
When creating documents, make sure that your links are meaningful -- that is, that they avoid online-specific references, and that they don't detract from readability. The text of your links should flow well in the context of the rest of your text , and your text should also be able to stand alone as a printable document . You should at all costs avoid the "Click Here" syndrome, as shown in Figure 4.
[Figure 4, The "Click Here" Syndrome (Arena)]
Figure 4 is also bad because it refers to "clicking", which assumes that everyone is using a mouse with their browser, which is not always the case. A much better alternative is demonstrated in figure 5.
[Figure 5, Meaningful Link Text (Arena)]
Another point to consider about the choice of words selected for link text ("information about cows", in this example), is that often this link text may be what is used as information for a reader's bookmark or hotlist entry. When the word "here" is used as link text, the hotlist may become cluttered with entries that read only, "here", instead with information about what the link is actually about.
Headers provide a useful way to provide an outline for your document.
Headers of level 1 (H1
) indicate major points, while
headers of level 2 (H2
) provide sub-topics to those
points, and so on and so forth. It is important to remember that
the purpose of these headers is not to provide specific kinds
of fonts or layout, but rather to organize a document into sections.
To that end, here are some recommendations about heading usage:
H3
element should
not follow an H1
element directly.
EM
or B
.
H2
or H3
merely because it appears to provide the correct size and bolding of fonts
on the browsers used by local readers. On another browser, that
same text may be incredibly grotesque and large, not providing
the desired effect at all. Figures 6 and 7 demonstrate this
effect.
[Figure 6: Expected headline rendering (Arena)]
[Figure 7: Unexpected headline rendering (Netscape)]