This section details common errors in HTML composition, that may lead to documents which are not fully device-independent. The behaviors of these errors are undefined, so certain browsers may render them as intended but not all browsers are guaranteed of doing so. Therefore, these mistakes should be avoided, even if your browser of choice renders your documents correctly.
The use of the paragraph element (P) can be confusing. When HTML was first introduced, <P> served as a paragraph separator, not as an end-of-paragraph; a confusion which originally prompted this document. However, more recent version of the HTML 2.0 and later specifications have changed this behaviour.
The current recommended use of the P element is to be placed at the beginning of paragraphs; for example:
<P> In this paragraph, our hero discovers that he really likes baloney sandwiches. He also listens to some disco, and has a lovely beverage. Ah, if only all paragraphs were this exciting!
This is in contrast to previous usage, where the <P> was usually placed at the end of the paragraph.
Still, in certain contexts, use of <P> should be avoided, such as directly before any other element which already implies a paragraph break. To wit, the <P> element should not be placed before the headings, HR, ADDRESS, BLOCKQUOTE, or PRE.
It should also not be placed immediately before a list element of any stripe. That is, a <P> should not be used to mark the end-of-text for <LI>, <DT> or <DD>. These elements already imply paragraph breaks.
Some clarifications on the above might be in order. One is the difficulties of rendering appropriate white space by a browser. While it is true that all of the entities mentioned above imply a paragraph break, this only occasionally means that they also imply white space between sections -- this depends on the browser. So, while you might feel inclined to add a <P> in order to fix white space problems, please think twice and avoid it if you can.
Also, when using the glossary list (DL), please try to avoid using multiple DDs (definitions of terms) in order to provide multiple entries for a term (DT). Instead, use a <P> marker between paragraphs in a definition.
All clear now?
Simply put, a character
reference and an entity reference are ways to represent
information that might otherwise be interpreted as a markup tag. For
instance, in order to represent <P> in this text, I had to use
<P>
in my raw HTML. There are currently four
entities for this purpose in HTML, as well as several entities
which allow encoding of the ISO
Latin-1 Character Set.
The most common error in the use of references is to leave off the trailing semicolon. Also, no additional spaces are needed before or after the entity/character reference.
One grey area involves references to directories. It is possible to request an index of a directory from an HTTP server. The typical response from the server is to either return a pregenerated index document (which is often the document "index.html" in the referenced directory), or to construct an HTML document on the fly which contains a listing of all files in the directory. However, when making such a directory reference, it is important to make sure to have a trailing slash on the URL. That is, if you were to request the index of the directory which this document resides in, you would want to refer to it as http://www.cs.cmu.edu/~tilt/, not as http://www.cs.cmu.edu/~tilt.
Some servers are able to catch these errors, and provide redirection to the proper URL, but it's best to get the URL right in the first place -- notably because not all browsers support transparent redirection.
Problems can arise when the hostnames in URLs aren't fully qualified In local networks, you can usually refer to your own machines simply by their names -- for instance, here at Willamette we refer to our local WWW server as "www". However, the server's FQDN (fully qualified domain name) is "www.cs.cmu.edu". The FQDN provides enough information that any host, anywhere on the Internet, can find this particular machine. (It's like trying to find all the Vermeers in New York :).
What happens is that an HTML author might construct a link that looks like this:
<A HREF="http://www/~tilt/metanoia/">Metanoia -- A
Change In Spirit</A>
which produces a link to Metanoia -- A Change In Spirit that will only work for people in the local network which that machine is on. A correct link would look like this, instead:
<A HREF="http://www.cs.cmu.edu/~tilt/metanoia/">Metanoia</A>
which would allow all of you who are interested in Metanoia to actually follow the link.
This leads almost directly into:
Finally, a brief section on relative URLs. It is possible to construct a "relative" URL, which gives you the following advantages:
However, relative URLs can also break things.
A relative URL is a URL which doesn't contain all the necessary parts of a "full" URL (scheme, host, path information). There's a large number of things which might fit this description! The browser will try to assume the parts that have been "left out" by using the information from the URL of the document which contains the link. However, not all browsers will make these assumptions in the same way. Here's a short list of what's "safe" and "unsafe" (based on experience, and not on a specification anywhere -- unfortunately).
Currently, I don't understand this problem well enough to speak about it. I will try and get a canonical answer when next I have the energy to update this document.
(This sub-section isn't written very well, I fear. If anyone has any better copy, I'll gladly put it here instead. -et/April 7, 1994)
One common error that I used to make all the time (I use Marc Andreesen's html-mode.el for Emacs these days -- I had to learn Emacs, but now it's so much easier to write HTML!) was to leave off a quote in my start tags. For example, this reference to the euphonium, king of instruments should look like:
<A
HREF="http://www.cs.cmu.edu/~tilt/euphonium.html">
but I would often use
<A
HREF="http://www.cs.cmu.edu/~tilton/euphonium.html>
instead. I suppose by the end of that huge URL, I'd forgotten it was supposed to be quoted. The behaviour of browsers upon encountering this varies -- some display a proper link, but you can't follow it, while others actually eat up huge portions of the following text, thinking it to be part of the URL.
Many of the HTML elements contain information within them. For example,
<em>emphasized text</em>
would be rendered as
emphasized text. There is a start tag
(<EM>
), some content (which may include text, and
in some cases, other nested elements), and an end tag
(</EM>
, indicated by the </). A common mistake
is to miss the / in the end tag. All elements (except empty elements,
see next paragraph) must be terminated by an end tag -- otherwise,
undefined behavior may occur.
Some HTML elements may be empty, such as <P> and <HR> (the HTML 2.0 specification provides more information about element content). If this is the case, there is no need for an end tag.
The upshot is, this document has always been meant as a public service, and will remain a public service. I hope you've found it to be useful; I've had fun providing it for your use.
James "Eric" Tilton, HTML Guru Wannabee, tilt+@cs.cmu.edu