Common Errors

This section details common errors in HTML composition, that may lead to documents which are not fully device-independent. The behaviors of these errors are undefined, so certain browsers may render them as intended but not all browsers are guaranteed of doing so. Therefore, these mistakes should be avoided, even if your browser of choice renders your documents correctly.

Contents

Paragraph Element Errors

The use of the paragraph element (P) can be confusing. When HTML was first introduced, <P> served as a paragraph separator, not as an end-of-paragraph; a confusion which originally prompted this document. However, more recent version of the HTML 2.0 and later specifications have changed this behaviour.

The current recommended use of the P element is to be placed at the beginning of paragraphs; for example:

<P> In this paragraph, our hero discovers that he really likes
baloney sandwiches. He also listens to some disco, and has a lovely
beverage. Ah, if only all paragraphs were this exciting!

This is in contrast to previous usage, where the <P> was usually placed at the end of the paragraph.

Still, in certain contexts, use of <P> should be avoided, such as directly before any other element which already implies a paragraph break. To wit, the <P> element should not be placed before the headings, HR, ADDRESS, BLOCKQUOTE, or PRE.

It should also not be placed immediately before a list element of any stripe. That is, a <P> should not be used to mark the end-of-text for <LI>, <DT> or <DD>. These elements already imply paragraph breaks.

Caveats

Some clarifications on the above might be in order. One is the difficulties of rendering appropriate white space by a browser. While it is true that all of the entities mentioned above imply a paragraph break, this only occasionally means that they also imply white space between sections -- this depends on the browser. So, while you might feel inclined to add a <P> in order to fix white space problems, please think twice and avoid it if you can.

Also, when using the glossary list (DL), please try to avoid using multiple DDs (definitions of terms) in order to provide multiple entries for a term (DT). Instead, use a <P> marker between paragraphs in a definition.

All clear now?

Character and Entity Reference Errors

Simply put, a character reference and an entity reference are ways to represent information that might otherwise be interpreted as a markup tag. For instance, in order to represent <P> in this text, I had to use &lt;P&gt; in my raw HTML. There are currently four entities for this purpose in HTML, as well as several entities which allow encoding of the ISO Latin-1 Character Set.

The most common error in the use of references is to leave off the trailing semicolon. Also, no additional spaces are needed before or after the entity/character reference.

URL Errors

Another misunderstood aspect of HTML is in the composition of URLs.

Directory Reference Errors

One grey area involves references to directories. It is possible to request an index of a directory from an HTTP server. The typical response from the server is to either return a pregenerated index document (which is often the document "index.html" in the referenced directory), or to construct an HTML document on the fly which contains a listing of all files in the directory. However, when making such a directory reference, it is important to make sure to have a trailing slash on the URL. That is, if you were to request the index of the directory which this document resides in, you would want to refer to it as http://www.cs.cmu.edu/~tilt/, not as http://www.cs.cmu.edu/~tilt.

Some servers are able to catch these errors, and provide redirection to the proper URL, but it's best to get the URL right in the first place -- notably because not all browsers support transparent redirection.

Not Using Fully Qualified Domain Names

Problems can arise when the hostnames in URLs aren't fully qualified In local networks, you can usually refer to your own machines simply by their names -- for instance, here at Willamette we refer to our local WWW server as "www". However, the server's FQDN (fully qualified domain name) is "www.cs.cmu.edu". The FQDN provides enough information that any host, anywhere on the Internet, can find this particular machine. (It's like trying to find all the Vermeers in New York :).

What happens is that an HTML author might construct a link that looks like this:

<A HREF="http://www/~tilt/metanoia/">Metanoia -- A Change In Spirit</A>

which produces a link to Metanoia -- A Change In Spirit that will only work for people in the local network which that machine is on. A correct link would look like this, instead:

<A HREF="http://www.cs.cmu.edu/~tilt/metanoia/">Metanoia</A>

which would allow all of you who are interested in Metanoia to actually follow the link.

This leads almost directly into:

Improper Use of Relative URLs

Finally, a brief section on relative URLs. It is possible to construct a "relative" URL, which gives you the following advantages:

However, relative URLs can also break things.

A relative URL is a URL which doesn't contain all the necessary parts of a "full" URL (scheme, host, path information). There's a large number of things which might fit this description! The browser will try to assume the parts that have been "left out" by using the information from the URL of the document which contains the link. However, not all browsers will make these assumptions in the same way. Here's a short list of what's "safe" and "unsafe" (based on experience, and not on a specification anywhere -- unfortunately).

Safe: Same directory relative URLs
A reference to a document in the same logical directory (such as <A HREF="strict-html-gp.html">Good Practices</A>) is safe. This kind of reference, roughly speaking, contains no "/"'s.
Safe: Same server relative URLs
A reference to a document in the same server (such as <A HREF="/~tilt/">Eric's Hyplan</A>) is also safe. This kind of reference, roughly speaking, will begin with a "/". (It will also be semi-absolute, in that it starts at the top of that server's directory structure...)
Unclear: Most other kinds of relative URLs
References such as <A HREF="~tilt/euphonium.html"></A> can be dangerous -- sometimes browsers will interpret that as meaning "go up one directory level, find the directory '~tilt', and then find 'euphonium.html' in it." And sometimes they won't.

Currently, I don't understand this problem well enough to speak about it. I will try and get a canonical answer when next I have the energy to update this document.

Unsafe: "file://localhost/..."
It's also possible to have a reference to "file://localhost/some/file/pathname". What this does is references the file described on the local host of whoever is browsing the document. Which is why a reference to <A HREF="file://localhost/etc/motd"></A> will display the message of the day on your machine, not the message of the day on my machine. Unless you know what you are doing, these references will really mess up your documents.

(This sub-section isn't written very well, I fear. If anyone has any better copy, I'll gladly put it here instead. -et/April 7, 1994)

Missing Quotes in Start Tags

One common error that I used to make all the time (I use Marc Andreesen's html-mode.el for Emacs these days -- I had to learn Emacs, but now it's so much easier to write HTML!) was to leave off a quote in my start tags. For example, this reference to the euphonium, king of instruments should look like:

<A HREF="http://www.cs.cmu.edu/~tilt/euphonium.html">

but I would often use

<A HREF="http://www.cs.cmu.edu/~tilton/euphonium.html>

instead. I suppose by the end of that huge URL, I'd forgotten it was supposed to be quoted. The behaviour of browsers upon encountering this varies -- some display a proper link, but you can't follow it, while others actually eat up huge portions of the following text, thinking it to be part of the URL.

Missed End Tags

Many of the HTML elements contain information within them. For example, <em>emphasized text</em> would be rendered as emphasized text. There is a start tag (<EM>), some content (which may include text, and in some cases, other nested elements), and an end tag (</EM>, indicated by the </). A common mistake is to miss the / in the end tag. All elements (except empty elements, see next paragraph) must be terminated by an end tag -- otherwise, undefined behavior may occur.

Some HTML elements may be empty, such as <P> and <HR> (the HTML 2.0 specification provides more information about element content). If this is the case, there is no need for an end tag.


You may also want to


Copyright © 1994, 1995 by Eric Tilton. Permission is granted for individual use and reproduction provided that this document remains intact, with this copyright message clearly visible. Commercial use and reproduction rights are held by Addison-Wesley, and this document may not be resold or redistributed for compensation of any kind without prior written permission from Addison Wesley -- contact me for details. Parts of this document appear in a revised form in the upcoming book, Web Weaving (ISBN 0-201-48959-7), by Eric Tilton, Carl Steadman, and Tyler Jones, to be published by Addison-Wesley. Look for it in a bookstore near you!

The upshot is, this document has always been meant as a public service, and will remain a public service. I hope you've found it to be useful; I've had fun providing it for your use.


Last modified: Jan 14, 1996

James "Eric" Tilton, HTML Guru Wannabee, tilt+@cs.cmu.edu