Since HTML (and also SGML) is designed to be a device independent language for describing the content of documents, most of the elements within it aren't intended to give direct control to the author over how the final page layout will look. The major exceptions to this are in the character highlighting elements.
There are two types of character
highlighting elements -- physical and logical. The physical styles
involve things like "italic font", and "boldface"; while the logical
styles are things like "emphasis", "citation", and "strong." It is
strongly recommended that you employ the logical styles rather than
the physical styles in your documents. Using the I
element to render text in italics will only be effective on those
browsers which are capable of displaying italics -- which all browsers
are not guaranteed to be able to do. It is far better to encode
semantic content -- to describe things in terms of logical styles --
and then allow the browser to display that semantic structure as best
it can, given its display capabilities.
So, instead of
<I>italics</I>
you might use
<EM>emphasized</EM>
or
<CITE>citation</CITE>
and instead of
<B>bold</B>
you might use
<STRONG>strong</STRONG>
This also leaves the possibilities open in the future for more sophisticated uses of these semantic encodings, which have much more inherent meaning than font styles like bold or italic. For example, the Lycos indexing system can take advantage of semantic encoding to create abstracts of documents.
Note: Before you stop using B
and
I
altogether, here's another viewpoint to consider.
One argument against logical character styles is that it turns
out to be a bottomless pit, a fruitless attempt to define logical
styles for every possibility. Physical styles, combined with the
context of the text in which they are placed, seem to provide
a much richer set without a huge number of tags. Consider the
large space of context that can be implied with only the typographical
conventions of bold or italic. The only problem
is that that contextual space needs to have a human being to interpret
it, which would make some kinds of computer-based rendering difficult,
if not impossible (e.g. speech synthesis).
The title of this section is somewhat facetious, but only somewhat. It's more and more obvious from current Web development efforts that the main attraction of the Web is not hypertext, and it's not an easy interface; the main attraction is the flashy graphics and the alluring promise of multimedia. We shall heroically refrain from commenting on whether this is a good or a bad thing, for the fact remains that online multimedia is here to stay. What we will comment on is on the issues that must be considered to use multimedia for best effect.
The first set of issues revolves about the faux sense of page design one can get by using inline images. An early example of this was one of the early commercial forays into the Web, a graphic design house which advertised professional layout services for online brochures. They spent quite a bit of time designing graphics images of the proper width so that they could achieve page-layout effects like right justification and centering, and created a page which was fairly well-designed. However, they got bitten because this design relied on a browser's window being the default width for X Mosaic. With a wider window, the carefully aligned logo in the upper right corner was immediately followed by the image that should have been left justified on the following line.
Current browsers implement some better forms of layout control
for images. For example, an author can specify the way in which
text will flow around an image with an ALIGN
element.
Figures 8 and 9 exemplify this; the former has no text-flow
information, and the latter does. This is not perfect, as using
the ALIGN
tag can cause strange stair-stepping effects
if there is not enough text separating two images, as figure 10
illustrates. If the desired effect is of images with captions,
a table is probably the best approach for layout purposes (Figure
11).
[Figure 8: IMG without the ALIGN element (Netscape)]
[Figure 9: IMG with the ALIGN element (Netscape)]
[Figure 10: Stair-stepping due to ALIGN (Netscape)]
[Figure 11: Using TABLE for layout (Netscape)]
Another consideration is unnecessary duplication of effort. Many authors swear by colored bullets and colorful horizontal rules, implementing both effects by using inlined images rather than the structural markup. Doing this can leave the portion of your audience which is unable (or unwilling) to view inlined images out of the loop, and can also negate some of the benefits provided by structural markup. There is also an unexpected side effect to using many small images: the current way in which Web clients retrieve documents requires that a separate connection to a Web server be initiated for each image. The time involved in negotiating this connection may actually be larger than the time involved in retrieving the image itself. Consider whether the effect achieved by the "enhanced" layout justifies the cost.
Another concern is the size of images. With the increasing home popularity of the Internet, more and more users are purchasing dial-up connections of one sort or another. This may be of the strict "shell-account" variety, which means that your readers will not see images at all, or they may be of the SLIP/PPP variety, which means that your readers will have an average of only 14,400 bits of information per second sent to them. This is not a large number, and huge images can take minutes to load. Bear this in mind when selecting images; will the image take so long to load that your reader will go somewhere else rather than wait?
The image size issue can be alleviated in several ways. First, the
increasing popularity of the JPEG
format means that images can be compressed to much smaller sizes,
which provides dramatic speed-up in image load time. Even better
results can be achieved by using fewer colors (gray scale, rather than
full 24-bit color, for example). Another approach is to use a small
set of navigational icons which appear on every page in your Web. Most
browsers now cache documents and images; using the same icons (and
using the same URL to refer to them with, perhaps by
maintaining an /icons
directory on your Web server) means
that the reader will only incur the cost of downloading once.
Also, when using the IMG
element, don't forget to
also use the ALT
attribute. The ALT
attribute allows alternate text to be specified for an inlined
image. This is especially useful for images that have specific meaning
(and provide a link to other documents), as that meaning can be lost
on those who do not have images loaded. For example:
<IMG SRC="http://www.miskatonic.edu/icons/next.gif">
can be better represented with the addition of the following ALT attribute:
<IMG SRC="http://www.miskatonic.edu/icons/next.gif" ALT="[Next Page]">
as shown in figures 12 through 16.
[Figure 12: The Document As Expected (Netscape)]
[Figure 13: Inlined Images Off/No ALT Tag (Netscape)]
[Figure 14: Text Browser/No ALT Tag (Lynx)]
[Figure 15: Inlined Images Off/ALT Tag Supplied (Netscape)]
[Figure 16: Text Browser/ALT Tag Supplied (Lynx)]
Finally, don't rely entirely on image maps and graphic logos to build your site. There are a few sites which have almost no textual content whatsoever; when visited by readers who do not (or cannot) load images, there is no information available. This is not to say that image maps must be avoided altogether. Instead, provide alternative means of navigation which supplement the image map, such as explanatory text which follows your map.
This section details common errors in HTML composition that may lead to documents which are not fully device-independent. The behaviors of these errors are undefined, so certain browsers may render them as intended but not all browsers are guaranteed of doing so. Therefore, these mistakes should be avoided, even if your browser of choice renders your documents correctly.
These errors are, for the most part, artifacts of "raw" HTML authoring. Web development has suffered from a lack of good authoring tools, a situation which is only now beginning to be rectified. Many of these errors involve typos or simple mistakes, although others deal with more fundamental conceptual problems.
The use of the paragraph element (P)
can be confusing. When HTML was first introduced,
<P>
served as a paragraph separator, not as an
end-of-paragraph; a confusion which originally prompted this
document. However, more recent version of the HTML 2.0 and later
specifications have changed this behavior.
The current recommended use of the P element is to be placed at the beginning of paragraphs; for example:
<P> In this paragraph, our hero discovers that he really likes baloney sandwiches. He also listens to some disco, and has a lovely beverage. Ah, if only all paragraphs were this exciting!
This is in contrast to previous usage, where the <P>
was usually placed at the end of the paragraph.
Still, in certain contexts, use of <P>
should
be avoided, such as directly before any other element which already
implies a paragraph break.
To wit, the <P>
element should not be placed
before the headings,
HR,
ADDRESS,
BLOCKQUOTE,
or PRE.
It should also not be placed immediately before a list
element of any stripe. That is, a <P> should not be used to
mark the end-of-text for <LI>
,
<DT>
or <DD>
. These elements
already imply paragraph breaks.
Some clarifications on the above might be in order. One is the
difficulty of rendering appropriate white space by a browser. While
it is true that all of the entities mentioned above imply a paragraph
break, this only occasionally means that they also imply white space
between sections -- this depends on the browser. So, while you might
feel inclined to add a <P>
in order to fix white
space problems, please think twice and avoid it if you can.
Also, when using the glossary list (DL),
please try to avoid using multiple DDs (definitions of terms) in order
to provide multiple entries for a term (DT). Instead, use a
<P>
tag between paragraphs in a definition.
All clear now?
Simply put, a character reference and an entity reference are ways to represent information that might otherwise be interpreted as a markup tag. For example, consider the rendered HTML document in figure 17.
[Figure 17: Properly escaping character entities (Arena)]
The source which produces this document, which uses entities, looks like:
In order to represent the "<P>" in this text, I had to use &lt;P&gt; in my raw HTML.
In this example, the <
becomes "<", the
>
becomes ">", the "
becomes a quotation mark, and the &
becomes
"&" (which is needed in order to represent the text
<
in the document without the text being turned
into "<"). There are currently four
entities for this purpose in HTML, as well as several entities
which allow encoding of the ISO
Latin-1 Character Set.
The most common error in the use of entities is to leave off the trailing semicolon. Also, no additional spaces are needed before or after the entity/character reference. Here are some examples of incorrect usage:
Doug & Chris went out for a walk. A paragraph break can be represented with "e; < P > "e;
Can you spot the errors in the above examples? They are:
&
" needs to
have a semicolon after it.
"e;
" should
be ""
" (this is subtle and annoying, much
like the Unix system call, creat()
)
"<P>"
.
Another misunderstood aspect of Web document composition is in the creation of URLs.
One grey area involves references to directories. It is possible to request an index of a directory from an HTTP server. The typical response from the server is to either return a pre-generated index document (which is often the document "index.html" in the referenced directory), or to construct an HTML document on the fly which contains a listing of all files in the directory. However, when making such a directory reference, it is important to make sure to have a trailing slash on the URL. That is, if you were to request the index of my home page, you would want to refer to it as http://www.cs.cmu.edu/~tilt/, not as http://www.cs.cmu.edu/~tilt.
Many servers are able to catch these errors, and provide redirection to the proper URL, but it's best to get the URL right in the first place -- notably because not all browsers support transparent redirection. Also, getting this correct the first time means it will take less time for the page to be loaded; your readers won't have to wait through the time needed to open two (or more) HTTP connections.
Problems can arise when the hostnames in URLs aren't fully
qualified. Within a local network, a machine can often be simply
referred to by its host name. For example, the domain
miskatonic.edu
might have in it a WWW server with the
host name www
. Readers within that domain can refer to
the machine by this name. However, the server's fully qualified
domain name is www.miskatonic.edu
. This fully qualified
domain name provides enough information that any host, anywhere on the
Internet, can find this particular machine.
What happens is that an HTML author might construct a link that looks like this:
<A HREF="http://www/~tilt/metanoia/">Metanoia -- A Change In Spirit</A>
which produces a link to "Metanoia-A Change In Spirit" that will only work for people in the local network which that machine is on. A correct link would look like this, instead:
<A HREF="http://www.cs.cmu.edu/~tilt/metanoia/">Metanoia -- A Change In Spirit</A>
which would allow all of the readers who are interested in Metanoia -- even those living in Freedonia -- to actually follow the link.
Along those same lines, be careful in using URLs of the scheme
"file:
". It's possible to have a reference
to file://localhost/some/file/pathname
. What this
does is references the file described on the local host of whoever
is browsing the document. Which is why a reference to <A
HREF="file://localhost/etc/motd">the message of the
day</A>
will display the message of the day on your
machine, not the message of the day on my machine. However, this
makes several assumptions about your reader's local machine
and network which you probably shouldn't be making. Unless
you know what you are doing (and probably even then), references
of this type will really mess up your Web.
One common error, especially with the current lack of widely available and useful authoring tools, is to leave off a quote in the attributes of tags. For example, this reference to the euphonium, king of instruments, should look like:
<A HREF="http://www.cs.cmu.edu/~tilt/euphonium/">
but people composing "raw" HTML from a text editor will often instead type
<A HREF="http://www.cs.cmu.edu/~tilt/euphonium/>
It's likely that by the end of that huge URL, the author had forgotten it was supposed to be quoted. The behavior of browsers upon encountering this varies -- some display a proper link, but you can't follow it, while others actually eat up huge portions of the following text, thinking everything up until the next quotation mark to be part of the URL.
Many of the HTML elements contain information within them. For
example, <EM>emphasized text</EM>
would
be rendered as emphasized text. There is a start tag (<EM>
),
some content (which may include text, and in some cases, other
nested elements), and an end tag (</EM>
, indicated
by the </
). A common mistake is to miss the /
in the end tag. All elements (except empty elements, below) must
be terminated by an end tag -- otherwise, undefined behavior
may occur.
Some HTML elements may be empty, such as <P>
and <HR>
(the HTML 2.0
specification provides more
information about element content). If this is the case, there is
no need for an end tag.
In general, the use of white space around element tags should be avoided. For example, if white space immediately follows a start tag, the style changes implied by that element may be applied to the initial space as well. For instance,
You really should <A HREF="http://www.cs.cmu.edu/~tilt/"> CZeCh THIZ 0uT </A> !
would be rendered in Netscape as shown in figure 18, and in Lynx as shown in figure 19.
[Figure 18: Improper use of whitespace (and spelling and punctuation, too) (Netscape)]
[Figure 19: Improper use of whitespace (Lynx)]
On some browsers, there may be white space around the anchor, which adds unwanted unsightliness to the rendering, and may lessen the impact of the document. (This comment really applies to white space immediately following start tags, and immediately preceding end tags.)