Index-Based Hyperlinks

John H. Hartman
Todd A. Proebsting
Rajesh Sundaram

Department of Computer Science
The University of Arizona
Tucson, AZ 85721

Abstract

We propose a new mechanism for implicitly specifying hyperlinks in HTML documents using indices. Indices are dictionaries that associate keys (words or phrases) with one or more attributes. Indices maintain these key/attribute bindings over all or part of a document, and are used by browsers to create hyperlinks dynamically. Indices may also include bindings of other indices, in a hierarchical fashion. We argue that indices are both simpler and more general than the current HTML hyperlink mechanism. We have developed a prototype browser that uses index-based hyperlinks.

1. Introduction

In HTML, a textual hyperlink exists as an explicit binding between a portion of text and a URL, created by the document's author. A hyperlink indicates that the author believes there is a relationship between the text and the URL, usually that the URL's target elaborates on the text. Typically, an HTML browser will highlight the hyperlink's text, allowing the reader to see the hyperlinks in a document. Clicking on a hyperlink transfers the browser's focus to the associated URL. Some browsers (e.g., Netscape Navigator) will display the URL when the cursor is over the hyperlink, providing a hint to the reader of what is at the other end.

We refer to HTML-style hyperlinks as explicit hyperlinks, because the author of the document must create an explicit binding between text and a URL. Thus explicit hyperlinks create a tension between a document's author and its reader: the reader prefers more hyperlinks because they increase the usefulness of the document, while the author prefers fewer because they require effort to create and maintain. As a result, the typical HTML document is likely to have too few links for its reader, and too many for its author.

There are many other drawbacks to HTML's explicit hyperlinks:

The underlying problem with HTML hyperlinks is that they merge both mechanism and policy in a single (inadequate) facility. A hyperlink consists of a single text string and a single URL (a limited mechanism), and selecting the text string always causes the URL to be displayed (a limited policy). A clean separation of mechanism and policy is needed to solve the problems inherent in HTML hyperlinks.

2. Implicit Hyperlinks

We propose implicit hyperlinks that use indices to solve the problems with HTML's explicit hyperlinks. The basic idea is that an index associates attributes with text phrases. The index therefore represents the mechanism behind hyperlinks: a collection of phrases, each with an associated list of attributes. The browser implements the policy behind the mechanism, i.e. what action should be taken based upon the user's actions and the attributes associated with the text. For example, an index may associate a "URL" attribute with each phrase; the browser can then use this attribute to display the appropriate page when the phrase is selected. Indices are not restricted in the attributes they contain, however. It is up to the browser to decide what it does with a particular attribute, if anything.

Our indices are similar to dictionaries (i.e., Webster's), thesauri (i.e., Roget's), book indices, and footnotes. Indices also share properties with symbol tables for statically scoped programming languages. Finally, indices have some similarities to search engines.

In its simplest form, an index is a collection of phrases and associated attributes. A simple entry might be:

which binds the phrase "Sumatra" to the given URL. Note that the index does not specify how this information is to be used (the policy), only that the URL is related to the phrase. The browser implements a policy by displaying the page at the URL when the phrase is selected (for example).

The bindings contained in the index simplify the task of turning "normal" text into hypertext. When an index with the above entry is associated with a text document, all instances of the word "Sumatra" become hyperlinks. The reader can select any of them to view the associated URL. Thus the index relaxes the tension between the author and reader: the author only need document the relationship between phrase and URL once, turning all instances of the phrase into hyperlinks.

The simple format of the index and the clean separation of mechanism and policy allows for much more complex browser behavior than is possible using HTML hyperlinks. For example, one could associate a textual description with an entry:

The policy for handling the description is implemented by the browser. For example, our prototype browser displays the definition of an entry when the cursor is placed over text matching the phrase. The description makes it easier for the reader to decide if the suggested URL is worth viewing.

3. Indirect Indices

So far we have discussed indices as if they were bound one-to-one to documents, but much of their power is realized if we relax this restriction. Quite often a set of pages will have many hyperlinks in common. For example, many of the pages in the Sumatra project's web site will contain references to the project's members, and ideally each of these references would be a hyperlink. Clearly, indices should be first-class objects, separable from the documents that use them and able to be reused and recombined in interesting ways not envisioned by their creators.

At its simplest, this functionality allows an index to be created separately from a document's text, and used by specifying its URL. For example,

    <index URL="http://www.cs.arizona.edu/sumatra/sumatra.index">

        The Sumatra project...

    </index>

causes the specified index to be used for the given text. The indirect access of an index through a URL allows several pages to share the same index, reducing the overhead of creating and maintaining hyperlinks. An author can create a single index for multiple pages, defining the hyperlink information only once. Furthermore, should this information change (e.g. a page's URL changes), only the index needs to be updated.

Indices can also be merged by specifying multiple URLs in the "index" command. This causes the browser to treat the indices as a single index. The policy for handling multiple matching entries is browser-specific; our prototype browser displays a menu when the reader selects text with multiple matches. The reader then selects which URL to display, if any. The ability to merge indices allows a single document to obtain its hyperlinks from many sources.

4. Hierarchical Indices

Another useful way of combining indices is to organize them into a hierarchy. Each level in the hierarchy contains entries specific to a particular context, with the index at the top of the hierarchy the most specific for the document, and the one at the bottom the least specific. For example,

    <index URL="moby.index">

        Call me Ishmael. Some years ago...

    <index URL="indonesia.index">

        Those narrow straights of Sunda divide Sumatra from Java;...

    </index>

        ...and the great shroud of the sea rolled on as it rolled
        five thousand years ago.

    </index>

creates a two-level hierarchy for the novel Moby Dick. The entire text uses the index "moby.index", containing hyperlinks related to the novel in general, whereas the portion of the novel that describes the passage of the Pequod through Indonesia also uses an index related to that area of the world. Thus the Indonesia index is at the top of the hierarchy and contains definitions for that context, while the Moby index is at the bottom and contains definitions for the novel as a whole. This scoping allows the browser to use the most-specific definition for a phrase, e.g. if the user selects "Sunda", the index "indonesia.index" is searched for a match prior to searching "moby.index".

Index hierarchies can also be created by having one index include another. For example, the Moby index described above may contain entries specific to Moby Dick , but also include another index with entries for Herman Melville in general.

A hierarchy of indices greatly simplifies the task of creating and maintaining hyperlinks. Consider the homepage for a university research project. That project would likely have its own index for project-specific terms and phrases. The project's index could in turn specify the use of the governing department's index, containing department-specific definitions. This index could then include the college's index, which could include the university's index, and so on.

Organizing indices into a hierarchy improves the lot of both a document's author and its reader. The author only need create document-specific hyperlinks, store them in the document's index, and include a more general index in the document's index. The reader is now able to access any phrase in the document that has a match in any of the indices. The document becomes much more useful to the reader, while requiring less effort on the part of the author.

Sometimes a phrase selected by a reader will match indices at several levels. We envision the browser using the highest match in the hierarchy, i.e. the most specific match. Our prototype browser implements this policy, but allows users to select matches at lower levels of the hierarchy. Allowing the user to select which match to use is a very powerful feature; imagine, for example, that a document's hierarchy has an English dictionary at its base. The reader can see the definition of a word simply by selecting the word and using the match in the dictionary. The reader does not need to go outside the hypertext paradigm to obtain the needed information, and the author needs to expend very little effort to add this functionality.

Figure 1, below, illustrates how hierarchical indices could be used. The selection of the phrase "Sumatra" causes matches in both the project-specific index and a more general index pertaining to Indonesia. The browser queries the user as to which selection it should use.

Figure 1.

5. Prototype

We have developed a prototype browser using HotJava [HotJava]. It implements the functionality described above, including implicit hyperlinks, indirect indices, and hierarchical indices. We have not done any performance testing as it is still very much a prototype.

6. Future Work

We are currently working on extending our index mechanism in a variety of ways:

7. Related Work

The idea of hypertext has been around for a long time. Dictionaries and encyclopedias can be viewed as Hypertext in which textual nodes are joined by referential links.

The origin of hypertext in the context of its present usage is attributed to Bush[Bush,1945]. He proposed a system called Memex in 1945 , the essential feature of which was its ability to tie two items together.

The term "Hypertext" was coined by Ted Nelson in 1965. Since the early 1960s, numerous Hypertext systems have been implemented. The NLS system [Engelbart, 1963] and Xanadu [Nelson, 1980] were some early efforts. More recent systems include Guide [Brown, 1987], Apple's HyperCard, Xerox's NoteCard and Intermedia [Meyrowitz, 1986].

Index-based hyperlinks have conceptual similarities with Dynamic Hypertext. Dynamic Hypertext can change its structure (the way the nodes are linked) in real real time, as opposed to the static and explicit model where the links and nodes must be specified at creation time. This concept of dynamically adaptable structures is called virtual structures. Halasz identified virtual structures, computation, and extensibility/tailorability as some of the issues to be addressed by next generation hypertext systems [Halasz, 1988].

In the Trellis system developed by Stotts and Furuta, a hypertext document is be considered to have two layers- a fixed underlying information structure that is created by the hypertext author and a flexible structure that can be generated dynamically and is tuned to the user's requirements [Stotts & Furuta, 1991].

There has less work however, on hypertext in the context of the World Wide Web. Currently, HTML uses a static explicit model for hyperlinks.

8. Conclusion

Index-based implicit hyperlinks provide a clean separation of hypertext mechanism and policy. Indices are the mechanism underlying hypertext, providing a mapping from text phrase to attributes. The browser implements the policy for using the attributes. This results in a powerful hypertext system in which indices can be accessed indirectly, merged into larger indices, and layered hierarchically. The result is a hypertext system in which documents have the full set of hyperlinks readers demand for convenient browsing, while reducing the author's burden of creating and managing links.

References

[Bieber, 1991]. Michael Bieber. Issues in Modeling a "Dynamic" Hypertext Interface for Non-Hypertext Systems, Proceedings of Hypertext'91, ACM Press, 1991.

[Brown, 1987]. Peter J. Brown. Turning Ideas into Products: The Guide System, Hypertext '87 Proceedings , November 1987.

[Bush, 1945] Vannevar Bush. As We May Think, The Atlantic Monthly , July 1945.

[Engelbart, 1963] Douglas C. Engelbart. A Conceptual Framework for the Augmentation of Man's Intellect, Vistas In Information Handling , Volume 1, Spartan books, Washington D.C., 1963.

[Halasz, 1988]. Frank Halasz. Reflections on NotesCards: Seven Issues for the Next Generation of Hypermedia Systems, Communications of the ACM , July 1988.

[HotJava, 1996]. Sun Microsystems, Inc. HotJava User's Guide, http://java.sun.com/HotJava/UsersGuide/users.html

[Meyrowitz, 1986]. Norman K. Meyrowitz. Intermedia: The Architecture and Construction of an Object-Oriented Hypermedia System and Applications Framework, OOPSLA '86 Proceedings .

[Nelson, 1980]. Ted Nelson. Replacing the Printed Word: A Complete Literary System, Information Processing '80 , 1980.

[Stotts & Furuta, 1991]. P. David Stotts, and Richard Furuta. Dynamic Adaptation of Hypertext Structure, Proceedings of Hypertext '91, ACM Press, 1991.





Return to Top of Page
Return to Technical Papers Index