ology.org -> resources -> Composing Good HTML

Composing Good HTML (section 4 of 4)

The little picture: a document corpus

Many of the decisions you make on a site-wide level to organize content carry over to the management of "documents", whether they be single pages of HTML, or a collection of such pages which cover a single topic. These things include such obvious carry-overs as having an overview of the information presented within the document available to the reader at the "top" page, or expected entry point; making links available at appropriate points (usually, at the tops or the bottoms of pages) to bring the reader back to the overview for the document; and keeping your collection of documents uniform in terms of both content and form.

Much of the management of documents, though, is the management of links. Hypertext is all about links -- this should be patently obvious to most. But producing hypertext is all about managing links from the perspective of your potential reader. Too often, Web documents fail by failing to manage links effectively -- either by delivering screenfulls and screenfulls of ever-scrolling text, or providing index-card-sized groupings of hypertext which link in a myriad of directions to other index-card-sized groupings of hypertext. Neither end of the spectrum allows the user to navigate the content presented easily: in one case, one becomes disoriented in a sea of text; in the other, in an ocean of links. Worse yet, documents can become so overseasoned with random and senseless connections to every possible place that that the reader becomes lost in a sea of text and links!

The key to managing links in your documents (besides simply verifying that they are correct) is to organize them into classifications, and to employ links of various classifications in a reasonable and intelligent way. The next few sections describe some of the various classifications of links.

Footnotes

There are two traditional purposes for footnotes: for bibliographic references, and for further commentary and/or elaboration of points within the main text. Links to short explanatory text within a hypertext document can be useful to readers, if it's clear from context that the link is a digression.

Within your documents, the "footnote" style of link should be regarded as an explanatory link which elaborates on the current discussion without drawing the reader away from the main text. A footnote will draw the reader away temporarily, explain something, and then allow the reader to return to the main flow of text. While a footnote might offer further links to further explanations of greater depth, the footnote itself is usually nothing more than a brief explanation or glossary-style definition.

You can achieve this effect by context, by linking from a phrase (as in the lemming example below) to a short explanation or parenthetical remark that explains the text in question. If you are to trying to achieve a more traditional effect, you can also use numbered note references, by either using a number surrounded by brackets ([1]), or by using the SUP element in HTML 3 (<SUP>1</SUP>).

HTML 3 also defines the FN element for use in footnotes, which, "when practical, [should be] rendered as pop-up notes":

<P>Nothing is certain about the <A HREF="#FN1">lemmings</A>,
other than that they left as they came, with nothing but a silly grin and
some lemon pies.

<FN ID="fn1">Lemmings: Small rodents that like to leap off of
cliffs if necessary for retrieving a really nice lemon pie.</FN>

Whole documents

Where the footnote provides brief elaboration, the link to a "whole document" (whether it be a single document, or to the entry point for a collection of documents) provides a whole new potential area of exploration. This is the most common sort of link, which provides a connection between your document and the outside world.

This sort of link should be used with care. It has the potential to draw your reader completely away from your document, by providing supplementary information that takes longer to read than the original document. It is better to use footnote-style links for explanation and elaboration, and from there to use links to outside documents to provide further reference information for the curious (and insatiable) reader. Another danger is that of peppering your document with random hypertext links that a reader feels she must follow, without actually providing further explanations or further reading that's germane to the context or the point of your own document.

On the other hand, if you are referring directly to another on-line document, this is the kind of link to use. By providing direct access to supplementary material for your readers, you can give them as much or as little detail as they are willing to plow through.

Indices

Another form of link is the index. Unlike the previous two classifications, which provide further information for the reader as they advance through the text, the index allows the reader to enter the text from whatever point she desires, so that she can get right to the meat of what she is interested in. An index allows the reader to cut through the author's pre-designed tour of the information, and get right to that vital information on wildebeest's dietary habits.

There are several variations on this. The most popular is the full-text searchable, allowing readers to query a database of keywords and retrieve those portions of your text which contain those keywords. Several software packages provide full-text searching capability, and the WN server has searching built-in.

Another variation is often found in books: an enumerated list of keywords. This differs from an index where the reader supplies the keywords in that the author can provide a selection of keywords that are particularly useful for finding information. This is important-picking proper keywords can be an arcane art, sometimes requiring intimate knowledge of the contents of the collection being searched. Especially if the collection is a large one, most keywords will return a large amount of documents which may be only partially related to what the reader had in mind.

Yet another variation provides even more refinement and selection: the table of contents. A table of contents is a form of index, organized by broad topic. Consider providing not just one, but multiple tables of contents for your documents, especially if there is more than one reasonable way in which to read the information.

Portability Between Server Platforms

One of the advantages of HTML, which most Web documents consist of, is that HTML is based upon a number of other clearly defined, widely supported, non-proprietary formats, such as ISO Latin-1 and Internet Media Types (itself based on MIME). This approach makes it much more likely that, a decade from now, your documents will not be part of some legacy system which is, at best, difficult to maintain and expand.

If your documents do have that kind of lifespan, however, it's probable that they will reside on multiple hosts in that timeframe: perhaps concurrently, in the case of popular sites which are mirrored. A little attention to the requirements of different filesystems during the initial planning of your site could save a lot of time spent renaming files and links in the future.

About filesystems: some make the argument that Web servers should sit atop databases, instead of filesystems; databases certainly allow non-hierarchical relationships between pieces of content and make it easier to provide "dynamic" documents (documents which alter their appearance or content based upon the user accessing the data or other conditions) than traditional filesystem-based approaches. By the time this book sees print, there will certainly be several HTTP-serving database systems which address many of the issues raised here "automatically".

There are some very compelling reasons for using a database over a file system. A database-oriented system might be utilized to maintain linkages as documents move and change; to track documents as they grow old, alerting maintainers to update the documents periodically so that they do not suffer "bit-rot"; and to generate multiple representations of a collection of information dynamically (allowing your readers to order your document collections in ways that make sense to them). However, a database approach is not required to get some of this functionality; other tools also exist that also do these sorts of things (Chapter 7 of Web Weaving covers these sorts of tools in more detail; examples include MOMspider and the W3C HTML Validation Service).

But this automation may not come cheap: there will always be a learning curve to mastering any system, proprietary or non-proprietary, and the skills learned from managing a proprietary system are not easily transferred to other systems. You, as an information provider, must rely on your database solutions vendor to understand your needs and continue to build the feature-set of the system to satisfy them as you develop and grow. You may be risking the future of your documents -- by marrying your content to a single-vendor methodology -- for some short-term gains in manageability and ease of publishing content.

Please keep these sorts of considerations in mind: a fear of ours is that the Web, as it moves forward almost exponentially, may lose any sense of history as links fail and documents drop out of view because the cost of maintenance and "keeping up" has grown too great. Pick simple solutions over complex ones.

Naming Space

Historically, most Web servers have been Unix-based, and have used the naming space associated with that operation system. Many servers have since been developed for other platforms, however, and it's no doubt prudent that, as you create documents, you do not adhere to a naming space for a particular platform such that you make it difficult to move your documents to another platform.

Some filesystems have naming spaces which are case-sensitive. Unix is a good example of an OS which would consider "document.html" a different file from "Document.html", while other file systems, such as the Mac OS, make no such distinction -- both names would refer to the same file. For the sake of portability, it's probably best to keep all the file and directory names within your web structure lowercase. An added benefit is that this makes your URLs much more human-communicable: it's much easier to read an all-lowercase URL over the phone than one which contains both uppercase and lowercase characters, when case is significant.
Some filesystems require file extensions to properly type files. Servers running under the Mac OS could serve up files with proper Content-type headers based upon the file's creator and file type stored in the file's resource fork; other filesystems use extensions to do this typing. It's always wise to use the appropriate file extension for the content type -- such as .gif for GIF files -- whenever possible.
Some filesystems are restricted to a limited number of significant characters. DOS and Windows, of course, only allow eight characters, plus three characters for the file extension. Generally, filenames under 32 characters should be fairly cross-platform, but for DOS/Windows (although Windows 95 and NT eliminate this restriction). If you think your files may ever need to live on a DOS or Windows server, you may need to restrict yourself to 8 + 3 character filenames.
Almost all filesystems define special characters.

Almost all operating systems allow certain special characters in filenames, while disallowing others; the Mac OS, for example, allows slashes in file names, while Unix doesn't. It's best to avoid all characters but for the letters a through z, the numbers 0 through 9, and the underscore, hyphen, and period.

Developing Content

Uniqueness

Uniqueness may not be seen as an important design goal at first glance: after all, uniqueness -- not duplicating efforts by creating or compiling same or similar content -- may appear to be more of a community issue than an organizational one.

Providing a unique resource, however, increases traffic to your site, and adds to the authoritativeness of your content (see below). It will also require support, and a popular, unique resource can have a spill-over effect on the other content you provide on your site, especially if your site has a consistent feel and character.

In addition, redoing what has already been done elsewhere can add to frustration on the part of readers. Providing yet another list of exciting online resources means that there is simply more of the same sort of content available, which readers must then evaluate and compare to other such resources. Providing a unique resource (or a resource in short supply) means that you are adding to the content of the network, instead of duplicating it.

How to check for uniqueness of content? There are many search mechanisms on the Web, such as Lycos. You can also check in relevant newsgroups and mailing lists. (Chapter 10 of Web Weaving covers these sorts of issues in more detail).

You can also produce your content so that it leans towards providing unique, value-added content: instead of simply providing a list of poetry sites, say, you could provide a list of poetry resources which you find particularly compelling, with descriptions of why you think they are compelling. Adding value and content means that you are being a good network citizen, leaving the community with more than you found it with.

Authoritativeness

Authoritativeness has always been a fallacy, except when read as author-itativeness; whatever claims to authority you or your organization have ultimately boil down to status and reputation within the community. One becomes a reputable source not by being non-refutable, but by putting a stamp on what you write; by claiming authorship, and, thereby, author-ity.

This means that readers must take greater responsibility for critically analyzing what documents they come across. But it also means that you must be responsible in establishing credentials for what you claim, providing source material and raw data to justify your conclusions.

In some sense, this is the end result of all of the things we discuss here (and in Web Weaving). In building and maintaining your infostructure what you are aiming for is authoritativeness; for creating documents which are well thought out and well designed; which do not become stale or inaccurate; and which remain both internally and externally consistent. Your mission now is to use the tools we have provided you with to place the stamp of authority and relevance on your own works, and to truly create infostructures on the Web which are compelling and creative. Good luck!

For More Information

There already exist documents on the Web which address this same topic, and perhaps in more detail. For definitive reference information you may wish to check the HTML specifications from the World Wide Web Consortium (W3C). For a more detailed discussion of HTML composition style, you should also check the Style Guide (especially the section on device-independent formatting), which is also from the W3C.

If you're looking for a good document for learning the basics of HTML, you will want to check out the Beginner's Guide to HTML, from NCSA.

Also useful is the Bibliography from Web Weaving, from Addison-Wesley (as soon as this is placed on-line, I'll put a link to it here).

For more up-to-date and very astute thoughts on making your Web intentions more useable and readable, check out Jakob Neilsen's www.useit.com.

Finally, the somewhat creatively-minded among you can draw inspiration from this page's evil twin, Composing Evil HTML. Officially, I don't endorse any of these techniques. Unofficially ... well, let's just say someday I intend to buy Andrew several beers.

Acknowledgements

I'd like to thank all of you who have visited this document and commented on it, suggesting fixes, clarification, and even new sections. You know who you are (even if I managed to lose your addresses in the flood of information)! It is, in some senses, always a work in progress and is always amenable to suggestion, modification, and repair. I appreciate your help!

We (the authors of Web Weaving) especially like to thank the folks at Addison-Wesley, for helping us turn all of this into much more than I, at least, ever thought it would be. There's something just so satisfying about actually holding a book, hypertext be damned.

Copyright © 1994-1998 by Eric Tilton. Permission is granted for individual use and reproduction provided that this document remains intact, with this copyright message clearly visible. Commercial use and reproduction rights are held by Addison-Wesley, and this document may not be resold or redistributed for compensation of any kind without prior written permission from Addison Wesley -- contact me for details. Parts of this document appear in a revised form in Web Weaving (ISBN 0-201-48959-7), a book by Eric Tilton, Carl Steadman, and Tyler Jones, published by Addison-Wesley in 1996.

The upshot is, this document has always been meant as a public service, and will remain a public service. I hope you've found it to be useful; I've had fun providing it for your use.

Last modified: Jul 13, 1998

James "Eric" Tilton, HTML Guru Wannabee and Occasional Author, tele@ology.org

(and with most of the Web style considerations contributed by Carl Steadman, Guy Who Doesn't Suck, carl@freedonia.com)

<< previous

Return to resources

Return to ology.org