British Library Research and Innovation Report 103

 
The Bradford OPAC 2 (BOPAC2)

Managing and Displaying Retrievals from a Distributed Search in Z39.50

F.H.Ayres, L.P.S.Nielsen, M.J.Ridley
Department of Computing
University of Bradford

British Library Research and Innovation Centre 1998


© The British Library Board 1998

The opinions expressed in this report are those of the authors and not necessarily those of the British Library.

RIC/G/342

ISBN 0 7213 9710 8

ISSN 1366-8218

British Library Research and Innovation Reports are published by the British Library Research and Innovation Centre and may be purchased as photocopies or microfiche from the British Thesis Service, British Library Document Supply Centre, Boston Spa, Wetherby, West Yorkshire LS23 7BQ, UK.
  


Abstract

This report describes work on the BOPAC2 project, funded by the British Library Research and Innovation Centre, from September 1996 to January 1998. The system is a World Wide Web front end that allows simultaneous access to a number of library catalogues via Z39.50. The system is designed to make access to large and complex retrievals simpler, similar records are clustered together and retrievals may be sorted in a number of ways and by different criteria. The design, development and evaluation of the system are described along with suggestions for future work.

Acknowledgements

We would like to acknowledge the help of the following people:

Table of Contents

Abstract

Acknowledgements

1. Introduction

1.1. Management Summary

2. Background

2.1. Related Items and Bibliographic Relationships

2.2. Large retrievals and Complex retrievals

2.3. The BOPAC2 Project

3. BOPAC2 Design and Development

3.1. Use of Existing Z39.50 Packages

3.2. Bradford OPAC 1

3.3. Overall System Architecture

3.4. The Europagate WWW-to-Z39.50 Gateway

3.5. Availability and Capability of Z39.50 Targets

3.6. Java Applet Design and Development

3.7. HCI issues

3.8. Testing and Refinement

4. Large and Complex Sets in Z39.50

4.1. The Networked Environment

4.2. The Z39.50 Approach

4.3. Changing the Query

4.4. Choosing Specific Records with Z39.50

4.5. Other Facilities for Reducing the Quantity of Material Transmitted

4.6. Modelling the BOPAC2 approach in Z39.50

5. Using BOPAC2

5.1. Search and Retrieval (Europagate software)

5.2. Viewing Retrievals with the Java applet

6. Evaluation and User Feedback

6.1. Comparison with other OPAC's

6.2. Usage

6.3. Evaluation with Testers and Feedback Questionnaire

7. Conclusions

7.1. Clumps

7.2. Further Work

8. References and Links

9. Appendices


1. Introduction

The aim of the BOPAC2 project was to investigate the issues in managing large and complex retrievals received via Z39.50 searches including searches of multiple databases. The Project aimed to build on the work done in the BOPAC1 project. In BOPAC1 retrievals were organised as "manifestations" of particular works, that is all the different versions of "Bleak House" by Charles Dickens or "Chemical Engineering" by Coulsen and Richardson would be grouped together. In BOPAC1 such related records were grouped together in the system's test database. In BOPAC2 since the records would be received from databases as a result of queries sent using the Z39.50 protocol it would not be possible to group records in advance as had been done in BOPAC1. Records would instead have to be "clustered" together as they were received. This meant that the BOPAC2 system would have to deal with the consequences of differences in cataloguing practices between Z39.50 servers. It also meant that the system would have to be more flexible than BOPAC1 since it was clear that not all differences between records could be overcome automatically.

The Z39.50 protocol for communicating with databases provides a uniform means of querying remote databases. This is a general purpose protocol, but its use with bibliographic databases for the transfer of MARC records has been the leading application area in the protocol's development. It has commonly been seen as a means by which one can query a remote database but using a more familiar local interface. Z39.50 server software is now often being provided with modern library database systems.

This means that many catalogues are now much more easily accessible. The problem remains of finding an effective way to search them.

The system was originally planned to be a PC based Z39.50 client with a graphical front end, developed from that built for BOPAC1. The growth of the World Wide Web and Java as a powerful programming language for Web applications led the Project to utilise the general design features and lessons of BOPAC1 but use them within a Web based application to make it more widely available.

Making BOPAC2 a World Wide Web application has opened it up to a wider audience than is possible with many research projects. The system has been announced on a number of mailing lists and via a number of different Web sites. It has also been demonstrated to a number of groups. The system will remain available for use after the completion of the Project and a version of this document will also be available on the WWW.

These are available as links from the BOPAC Home Page [1]

1.1. Management Summary

The BOPAC2 Project was initially funded by BL RIC for a 12 month period from 1st September 1996 to 30th September 1997. The Project was later granted an extension to 31st January 1997. The Project team was M.J.Ridley (manager), F.H Ayres (part-time research fellow) and L.P.S.Nielsen (research assistant, full time Sept 96 -Sept 97, part time Oct 97 - Jan 98).

The Project's aim was to investigate the issues of large and complex retrievals from Z39.50 searches. The Project's workplan was based on developing work and ideas from the BOPAC1 project. The original workplan envisaged a sequence of: Surveying existing Z39.50 clients; System design and development for retrievals from a single target; Testing and evaluation of single target retrievals; System design and development for retrievals from multiple target; Testing and evaluation of multiple target retrievals. The original workplan was aimed at the development of a PC based client system with a graphical user interface, on the lines of that from BOPAC1. Although a subsidiary project task was the investigation of alternative architectures for the system.

In the first months of the Project, whilst evaluating existing Z39.50 software it became clear that a number of important developments were taking place that the Project needed to take account of. These were the growth of the WWW and in particular library catalogues and other databases (such as search engines) availability on the Web. Allied to this was the appearance of Java as a powerful tool for developing Web based applications. A specific development was the release of the Z39.50 - Web gateway software from the Europagate project which supported retrievals from multiple targets. Previously multiple target support had not been easily available and had hence dictated the workplan.

In light of the developments above, which are explained in detail in other sections of the report, a revised workplan was developed. This entailed modification of the Europagate software, so that single and multiple targets would be supported from the start. And the system was to be built in Java, which would enable a system with the functionality originally envisaged to be available via the WWW. This would mean that the system could be made much more widely available for testing. These revised plans and progress on them were presented to the advisory committee in early 1997.

By Spring 1997 a working version of the system was made available on the WWW, but its URL was not made public. This enabled the Project to elicit feedback from a number of interested experts with particular library and cataloguing expertise. This was input into the system development process until Autumn 1997. From Autumn 1997 till the end of the Project, further system development, apart from bug fixes, was suspended to provide a fixed basis for testing and evaluation. In this phase, the system was made public by announcing it on a number of mailing lists, newsgroups and Web sites and an online questionnaire provided. During the course of the Project we had been experimenting with and monitoring a number of Z39.50 targets.

The set of targets was also kept constant during the test period. One important factor in the original workplan had been testing with users at Bradford. To allow this to take place Z39.50 software had been installed on the Bradford University Library system at the end of 1996. However this system was not in a satisfactory operational state till Summer 1997. The Project extension allowed us to have an extended period of testing with Bradford users. This was done with a tailored front end making the Bradford and Leeds University libraries and British Library Document Supply Centre catalogues available, a similar front end was also installed for Leeds users.

The system has been widely used in the course of the Project, with user feedback from USA, Australia, and Japan as well as across Europe and will remain in place and operational after the end of the Project. In the course of the Project, the system was demonstrated at a number of events including the Conference on Principles and Development of AACR in Toronto. The Project team also met and corresponded with a number of other projects working in similar areas such as ONE, UNIVerse and Riding projects.

2. Background

The Bradford OPAC 2 (referred to as BOPAC2) is the successor to Bradford OPAC 1 [2]. BOPAC 1 used a small demonstrator catalogue of records obtained from the bibliographic utilities to illustrate a new kind of bibliographic control based on the idea of "manifestation sets". BOPAC2 is concerned with how information retrieved from remote catalogues via the Z39.50 Information Retrieval Protocol.

2.1. Related Items and Bibliographic Relationships

The Project is concerned with the bringing together of related items in the catalogue display. Since the days of Lubetzky and the Paris Principles it has been argued that catalogues should bring together items related to the same work or the same author [3]. However the question of what actually constitutes a work is a difficult one and has been discussed at length over the years. The Anglo-American cataloguing codes [56] have never formally defined what a work is, but have incorporated various rules to deal with specific cases. A working definition of a work can be stated in terms of the main entry, but the rules about the main entry are ambiguous, which make it difficult for OPAC's to fulfil the second objective of the catalogue as defined by the Paris Principles and to collocate related items. Revisions to cataloguing practice have made the situation worse by emphasising the title main entry over uniform title [5], [6] and leaving the use of uniform title optional.

There is also the argument that the general idea of the second objective (i.e. the bringing together of related items) is even more important now than it was when it was first posited. The ever-growing size of catalogues, the globalisation and combining of union catalogues, new media, and the networked information infrastructure which facilitates distributed electronic documents; all these factors make it all the more important for the user to be able to identify related items [8]. A number of different bibliographic relationships have been identified [9] any of which may be of relevance to the end-user looking for related items. These relationships have been expressed within the cataloguing rules in various different ways [10]. In addition, there is a great deal of bibliographic relationship information buried in the general notes tag (500). Thus the problem of machine-extraction of bibliographic relationship information from the current stock of MARC records is a difficult one. This has lead to calls for a fundamental re-appraisal of the catalogue and the cataloguing process with a view to the incorporation of effective linking information into the catalogue record ([11], [12], [13], [15], [16] and [17]). The ambiguities and changes in cataloguing practice with respect to collocation make it difficult for existing catalogues to fulfil the second objective as it was surely intended [14].

Meanwhile the ground beneath our feet is shifting as the Z39.50 information retrieval protocol begins to encroach into the world of library catalogues. Until recently Europe (including the UK) has lagged behind the USA in the implementation of Z39.50 [53] but it is beginning to catch up now and there are a number of Projects taking place [55]. In many cases Z39.50 is being installed as a means of inter-operating catalogues to create a distributed library in which the constituent catalogues can be searched in tandem [54]. The current generation of Z39.50 targets operating in the UK have few if any facilities for collocating their retrieval set. Version 3 of Z39.50 does allow the retrieval to be sorted, which at least brings it up to roughly the same level as most conventional OPAC's, but no further.

2.2. Large retrievals and Complex retrievals

Another problem with OPAC displays is that of large retrievals, a problem likely to be exacerbated by the encroachment of the Z39.50 information retrieval protocol. In widening access to remote catalogues Z39.50 will inevitably lead to larger and larger retrievals. The solution to the problem of large retrievals is obvious: the retrieval must be reduced or organised in such a way that extraneous material can be disregarded. The big question is how. Plus, in a system where the "front-end" and the "back-end" are separated (as with Z39.50) another question is where: back-end or front-end.

Analysis of OPAC usage has shown that large numbers of hits are frequently generated [18] and that this causes problems for users. It is a problem not just for subject-based searches but also for title/author searches or known item searches. In general, conventional OPAC's lack good facilities to process large retrievals. Users can limit the retrieval by some secondary filter criteria such as language or date of publication, but often this part of the interface is not very friendly. In particular, they are not usually told enough about the retrieval to be able to formulate a sensible filter. For example a user may want to filter the retrieval by publication date but he/she cannot find out what dates are represented in the retrieval. Experimental systems such as OASIS [19] provide the user with information about their retrieval and have a feedback mechanism to enable them to reduce it better.

Martha Yee [4] points out that "A number of cataloguing theorists including Lubetzky have argued that known-item searches are looking for a work rather than a particular edition of a work.". Yet retrievals (even from a known item author/title search) may contain many works, only some of which are closely related to the required item [14]. Therefore the obvious way to reduce the size of a known-item retrieval is by organising or restricting it according to how closely the items relate to the required item, so that the user can home in on what they want. Indeed Hickey described a system which assists the user by outlining the retrieval's overall structure into a hierarchical tree and found (not surprisingly) that it works best for bibliographically closely related items [20]. Svenonius also outlined how an OPAC display might cluster equivalent records [52]

2.3. The BOPAC2 Project

BOPAC2 draws together some of the relevant features described in previous work and applies them to a distributed searching environment using Z39.50. It also introduces some new concepts related to the display of related records and works. It demonstrates how some of these ideas can be made to work within the user interface. The interface is extremely interactive and includes a number of tools (such as Find) with which the user can manipulate the retrieval to discover relationships between the records.

2.3.1. Clusters and Clustering

BOPAC2 has a clustering "engine" whose job it is to build and re-build clusters of records from the retrieval. A cluster can be thought of as an entity which contains (or more accurately points to) at least one record and which has a label which names the cluster. In BOPAC2 a cluster is typically displayed with the number of records it contains followed by the label. In general the cluster's label describes what the records have in common in terms of one field (typically a MARC subtag). An example might be a cluster of 3 records labelled "Penguin" which share the same publisher name (tag 260a).

If a record has several values in a MARC tag (i.e. a repeating subtag) then each value has its own cluster. For example a record with two publishers will have two clusters pointing to it: one for one publisher and one for the other. Sometimes clusters may be constructed by spanning across several MARC subtags. For example, a record with a main entry author (100) and an added entry author (700) may have two author clusters pointing to it: one from the main entry and one for the added entry (see below). These examples show one of the key features of this clustering approach: it can provide several different routes to the same record.

BOPAC2 draws together records where the contents of the fields match. In comparing the fields only alphanumeric characters are significant; spaces and punctuation marks are ignored, as is the case of the letters. This is a not a particularly sophisticated matching technique but it does ride over many minor formatting anomalies. It must be stressed that for this Project clusters are derived from the bibliographic records by algorithms. In BOPAC2 clusters are a display entity, not a cataloguing entity. There is a strong argument that bibliographic records should be linked more effectively, and that such links would improve the interface on systems such as BOPAC2 which could use the links, but this was not the prime purpose of the BOPAC2 Project. BOPAC2 clusters are not always perfect. Clusters convey information about the retrieval and provide a way to navigate through it. Often, the process of building clusters of records which share a common value serves to highlight the odd ones which ought to be in the cluster but are not (perhaps because of a breakdown in authority control).

2.3.1.1. Use of Main Entry

It has been argued that the use of main entry in the MARC format is anachronistic because the modern OPAC works through access points rather than entries [21]. BOPAC2's display looks like a list of entries but in fact it is a list of clusters derived from an access point. In effect, it is a list of all possible outcomes of a search on that access point. In computing these clusters the distinction between main and added entry is ignored. The contents of both main and added entry authors or titles are displayed together and the user follows whichever they want by name. If an author appears as the main entry on one record and as an added entry on another record, both will be clustered together in the display.

BOPAC2 does make use of the main entry in creating its work clusters (see below).

2.3.2. Clustering versus Collocation

Most OPAC's attempt to fulfil the second objective by collocation. BOPAC 2 has a different approach; it clusters related records together. How do collocation and clustering compare? Whereas collocation is essentially a sort on the retrieval, clustering goes further and adds extra layers of information on top of the retrieval. This has a number of advantages:

2.3.3. Work Clusters

BOPAC2 brings together items which are related to the same work by clustering around two fields "work author" and "work title". The contents of these fields are derived from the MARC tags. The algorithm for this was developed by observation; by looking at typical retrievals. It was refined several times during the course of the Project as more problems cropped up. Often changes to the algorithm to fix one problem caused another problem with another retrieval, so the end result is a pragmatic compromise which seems to work well in most cases. It could almost certainly be improved and made more sophisticated.

These algorithms make use of the implicit prioritisation of the main entry/added entry corresponding fields in the MARC record. An unusual feature is that work author and title are derived from the bibliographic record using the title and author query terms. The display is adapted to some extent to match the users' query. This means that the work title and author technically apply only to one particular retrieval. In another retrieval the same record could (theoretically) produce a different work title or author. In practice, this tends to happen if the record is retrieved in one search under uniform title, and then in another search under title main entry - i.e. where either one or other title meets the user's criteria.

Both work title and author fields are derived in two steps:

  1. Search through a list of possible MARC tags looking for one which matches the relevant query term.
  2. If nothing found in step 1, search another list of MARC tags and take the first tag found in the record.

2.3.3.1. Work Title

Step 1 attempts to match the title query term (if there is one) against uniform title and then main entry title.

Step 2 takes the first available tag from uniform title then main entry title.

Thus uniform title takes priority over main entry title unless the main entry title matches the title query term.

2.3.3.2. Work Author

Step 1 attempts to match the author query term (if there is one) against main entry personal author, corporate author then conference.

Step 2 takes the first available tag from main entry personal author, corporate author then conference, followed by added entry personal, corporate author then conference.

2.3.3.3. Query term matching tags

Usually the algorithm derives a title or author tag which matches the query term, and this explains to the user why the cluster is in the retrieval. In most Z39.50 retrievals there are also a significant number of records for which the derived work title and author do not match the query term; usually because the original query hit on other tags such as added entry or subject. In order to explain why the record was hit BOPAC2 searches through a list of possible MARC tags searching for any which match the query term. This matching tag is included in the label of the work cluster along with the title and author.

If there is a title query term the following subtags are tested in sequence until one matches:

  1. Uniform title
  2. Main entry title
  3. Other title information
  4. Uniform title added entry
  5. Title added entry
  6. Parallel title
  7. Translated title
  8. Varying form of title
  9. Second level title
  10. Parallel title second level
  11. Series title
  12. Personal name main entry title subtag (USMARC only: 100$t)
  13. Personal name added entry title subtag (USMARC only: 700$t)
  14. Uniform title subject heading
  15. Title subject heading
  16. Personal name subject name-title
  17. Corporate name subject heading
  18. Conference subject heading
  19. Statement of responsibility
If there is an author query term the following subtags are tested in sequence:
  1. Personal name main entry
  2. Corporate name main entry
  3. Conference name entry
  4. Title main entry
  5. Other title information
  6. Statement of responsibility
  7. Personal name added entry
  8. Corporate name added entry
  9. Conference added entry
  10. Personal name subject heading
  11. Corporate name subject heading
  12. Conference subject heading
These lists were developed and refined during the Project but would probably benefit from systematic testing, and may benefit from being extended to include other tags. Some obvious candidates would be the notes: title information (tag 514) or edition and history (tag 503). Although the notes are unlikely to be the source of the index which actually caused the record to be hit in the remote database, they may still provide a valuable explanation to the end user in cases where, for example, the title of a work has changed.

2.3.4. Clustering on other OPAC's

Several OPAC's have the capability to collapse the results of the search into some kind of clustered display. Hickey [20] describes a sophisticated system with similar capability to BOPAC2 which reduces a large retrieval by clustering around various fields such as Author, Title, Publisher, and Date. It appears to be more successful than the URICA system in Bradford University library which also has some clustering capability. This system often fails to cluster records completely or fails to explain why they were not clustered. For example a search for title "Chemical Engineering" on the Bradford University library OPAC produces the following list of headings:
---------------------- heading selection ---------------------
The following titles have been found (total 8):

No. works hdg
1.    1  [T]  Chemical engineering
2.   18  [T]  Chemical engineering
3.    1  [T]  Chemical engineering : a universal code as an aid to chemical systematics
4.    1  [T]  Chemical engineering
5.    1  [T]  Chemical engineering : introductory aspects
6.    1  [T]  Chemical engineering : an introduction
7.    1  [T]  Chemical engineering
8.    7  [T]  Chemical engineering
It is far from clear what the user is supposed to make of this. In fact, upon examining the full records, it turns out that heading 2 contains manifestations of Coulson and Richardson's "Chemical Engineering". Heading 7 also contains a manifestation of this work.

BOPAC2 differs from these systems, not only from the user's perspective (in terms of the information that is displayed) but more fundamentally in terms of the purpose of the clusters and the way they are constructed. The "collapsed headings" displays shown by other OPAC's go some way towards meeting the display collocation objective, but this is not their only or primary purpose. Such a display (which is in fact quite rare) is essentially a development of the browsable headings display (which is more common). In a list of browsable headings the user is shown a relevant extract from the index of titles or authors (depending on the query) along with numbers of entries under each heading. An example is shown below for a search for title "Hard Times" at Trinity College Dublin.

Title                                                    Number of Titles
   1. Hard tennis courts : Hints for groundsmen        (Title-Gen)            1
   2. Hard terms : unemployment and supplementary b... (Title-Gen)            1
   3. Hard Texas trail                                 (Title-Gen)            1
   4. Hard things and soft things                      (Title-Gen)            1
   5. The hard thorn                                   (Title-Gen)            1
   6. The hard time bunch                              (Title-Gen)            2
 >>>
   7. Hard times                                       (Title-Gen)           33
   8. Hard times : an authoritative text, backgroun... (Title-Gen)            1
   9. Hard times: an oral history of the Great Depr... (Title-Gen)            1
  10. 'Hard Times' and culture                         (Title-Gen)            1
  11. Hard times and easy terms : and other tales b... (Title-Gen)            1
  12. Hard times at Batwing Hall                       (Title-Gen)            1
  13. Hard times by Charles Dickens                    (Title-Gen)            1
Looking at these browsable lists of headings, the legacy of the card catalogue is obvious. What is shown here depends heavily on the cataloguer's choice of main entry and other decisions about which headings to list an item under. In other words, these displays are governed by the filing rules which are of little relevance in the OPAC world. They may assist with retrieval, but are still essentially descriptive and relate to the catalogue as a whole rather than to the retrieval set itself.

BOPAC2's purpose is not to describe the catalogue's filing sequence but to allow the user to investigate the retrieval. Its clusters are created and displayed within the context of the retrieval set itself, rather than the whole catalogue. The information displayed to the user is adjusted to reflect the size and content of the retrieval. The interactive sorting and keyword search make it possible for the user to discover relationships between the records within the retrieval. The clusters and the display are matched to the user's search terms. In a distributed Z39.50 environment, in which the retrieval may contain items from several different databases, this approach is an essential step towards a display which truly fulfils the collocation function of the catalogue.

3. BOPAC2 Design and Development

3.1. Use of Existing Z39.50 Packages

The intention of the Project was to develop a Z39.50-based front-end application to investigate user interface issues. It was decided that, as far as possible, public-domain Z39.50 components from other research projects should be re-used in this Project. The first Bradford OPAC project [2] had already developed techniques for clustering bibliographic records. What was required, therefore, was a package which could accept a search expression, apply it to one of more Z39.50 targets, and deliver the results as MARC records.

The Project began with a short survey of public-domain Z39.50 component packages, in an attempt to identify one which could be used as the basis for the BOPAC2 application. A survey of Z39.50 clients has been carried out by Andrew Wood [22], although this is more from the perspective of the end-user rather than the developer. From this survey, and other information gathered from the Z39.50 Maintenance Agency Page [23], several likely-looking Z39.50 packages were downloaded (mostly from Juha Hakala's archive of public-domain Z39.50 packages [51]) and installed. The packages examined were:

  1. DBV-OSI II from Crossnet Systems Ltd.and Die Deutsche Bibliothek
  2. CanSearch from Software Kinetics Ltd., developed for the National Library of Canada
  3. Zdemo, a simple client from OCLC
  4. YAZ (Yet Another Z39.50 toolkit) from Index Data
  5. The Stanford Z39.50 client by Harold Finkbeiner, Stanford University
  6. Isite from Clearinghouse for Networked Information Discovery and Retrieval (CNIDR)
Most of these systems, with the exception of YAZ, are fully-fledged Z39.50 origins in their own right, with their own user interfaces. None of the packages could deliver exactly what was required; they either did too little or too much. It was clear that for the purposes of the Project, some modifications would need to be made. At the start of the Project, therefore, the simplest and smallest package ( Zdemo) seemed the best choice.

3.2. Bradford OPAC 1

BOPAC2 is based on the principle of linking together bibliographic records to improve the effectiveness of the display. The BOPAC 1 demonstrator prototype was a complete miniature OPAC, including a demonstrator database of bibliographic records, an author/title search component as well as the display component. It was implemented entirely in the Microsoft Access RDBMS.

Although the fundamental concepts explored in BOPAC 1 remain in BOPAC2, the environment and the purpose of the two projects are quite different. Whereas BOPAC 1 contained its own rudimentary search and retrieve mechanism, BOPAC2 uses Z39.50. BOPAC 1 was a prototype demonstrator system. It was decided therefore not to attempt to re-use the prototype software directly. The display features and clustering algorithms developed in BOPAC 1 were the starting point for the design of BOPAC2.

3.3. Overall System Architecture

The decision about the architecture of the system was closely allied to the choice of Z39.50 subsystem (see above): There appeared to be 4 possible approaches:
  1. A standalone Windows PC-based origin (like Znavigator or BookWhere).
  2. A WWW-to-Z39.50 Gateway using HTML forms and CGI (like the Library of Congress Z39.50 Gateway).
  3. A WWW-to-Z39.50 Gateway plus an interactive Java applet (like the Java-ified version of Willow).
  4. A Web browser plug-in or the use of Active X technology.
Option 2 was dismissed on the grounds that the response time over the Internet was too slow and erratic to enable us to create a truly interactive front end. The white paper for the Java Willow project [24] describes the problems users have using a remote CGI-based Web OPAC. Internet congestion often makes it impossible to use Web OPAC's outside their home country. Anyone in Europe who has tried to search the Library of Congress web site in the afternoon can testify to this.

Of the remaining options, option 3 was the preferred choice. Other options (a standalone Windows program, a browser plug-in, or the use of Active X) would have limited BOPAC2 to one platform (Windows) or one web browser. This would have limited accessibility given that UNIX is used widely on the Bradford campus and at many other universities. The Java option would encompass the PC, UNIX and the Mac.

Also, and perhaps more importantly, with the standalone program or the browser plug-in, users would have had to download and install the software for themselves. This would have entailed formally releasing the software and would have made it difficult or impossible to make minor corrections to the code afterwards. The advantage of the Java route is that there is no need for users to download any software by hand. Every time the code is updated, it is automatically refreshed on the user's workstation the next time he/she runs a search. Thus there is no need to "release" the software as such. The software can evolve whilst it is being tested, and problems can be continuously corrected.

This arrangement was also convenient for the Project team; one of whom (Fred Ayres) was employed on a part-time basis and was working from home. Equipped with a modem and web browser he was able to participate fully in the development process and to give his reaction to changes to the software, as well as experimenting with the Z39.50 targets at weekends when the Internet was more responsive.

3.3.1. Java Issues

One problem with Java is that it is a 32-bit language and does not run very well under Windows 3.1. At the start of the Project, it was not clear how serious a problem this would be for two reasons. Firstly because it was not known how many users or librarians would be working with Windows and how many would be using UNIX. Secondly, of the Windows users, it was not known how many PC's were running Windows 95/NT (which can run Java) as against Windows 3.1 (which cannot).

When the Project was first proposed early in 1996, Java had support from every corner of the computing industry and seemed almost certain to become ubiquitous. It was assumed that during the lifetime of the Project, many libraries and campuses would upgrade their Windows 3.1 systems so that they were capable of running Java. In fact, at the end of the Project, Windows 3.1 is still apparently very common. The future of Java on the PC is rather less certain, with Microsoft and Sun Microsystems wrestling for control of the language, but at the moment it remains an effective cross-platform technology. BOPAC2 was the first Z39.50 origin in the world to use Java. As the Project proceeded, other larger projects such as Willow [24] also developed Java interfaces. CIMI [25] are demonstrating a Java-based target created by Blue Angel Technologies [26] who also have their own Java product, MetaStar. Librarians are even using traditional "green screen" Telnet-based OPAC's via Java [27] which suggests that Java-capable browsers are spreading onto library desktops.

It is also worth pointing out that, unlike conventional Z39.50 clients such as BookWhere [28] or ZNavigator [29], the Java-based architecture of BOPAC2 is ideally suited to an intranet. Network bandwidth, always a problem on the Internet, is more controllable problem over a localised intranet. For libraries, there are strong arguments in favour of using Network Computers rather than a bank of expensive PC's [30].

3.4. The Europagate WWW-to-Z39.50 Gateway

Early on in the Project the WWW-to-Z39.50 gateway software from the Europagate project [31] was released into the public domain. On inspection this seemed to provide an good basis for the Project. It was (and still is) the only public-domain WWW-to-Z39.50 gateway capable of searching multiple targets simultaneously, which was something that the Project had hoped to achieve. Although the source code itself is not commented, there was enough in the accompanying documentation to show that the required modifications would be possible.

The Europagate WWW-to-Z39.50 gateway was installed on a separate web server in the Department of Computing. Installation involved changing several lines in the main Makefile to adapt it to the department's UNIX environment. After a few attempts, the gateway was up and running. Initially, the only modification made to the system was to change the list of available Z39.50 targets to include some known UK targets.

Although the simultaneous search component of the software was only experimental, it proved reasonably stable in use. A few bugs in the CGI scripts caused problems but these were fixed with the help of Adam Dickmeiss at Index Data. The Europagate CGI scripts have been created by Index Data using their other toolkits, YAZ and IrTcl [32]. The Tcl scripts use standard Tcl, IrTcl (an extension of Tcl to handle Z39.50 z-associations) plus some extra custom Tcl commands. Modifying these scripts to suit the purposes of the Project was relatively difficult because they were not documented and had to be "reverse-engineered". These difficulties were complicated by the way in which the gateway joins stateless HTTP to stateful Z39.50. It uses a set of concurrent interacting UNIX processes and FIFO's (named pipes) which occasionally refused to "die" if the Tcl scripts were buggy. This meant that bugs in the Tcl scripts could leave rogue processes running on the departmental web server. Following two occasions where these rogue processes crashed the web server machine, it was decided to isolate the Europagate gateway onto its own web server. Since then, it has proved fairly stable.

One problem which was anticipated concerned the evaluation of the system. Users would find it difficult to distinguish those components of the interface which were derived from the Europagate software and those created as part of BOPAC2. The Project would have to evaluate both the Europagate and its own software together since it would be virtually impossible for users to separate these two in their own minds. Users would inevitably respond to the system as a whole. This did indeed prove to be a problem, and some of the feedback that we received was about the Europagate software.

Related to this was the question of the overall "look and feel" of the interface. The Europagate software, as supplied, uses standard HTML forms to control both search and display of results. The Project's assertion was that the display side needed to be genuinely interactive, but that such interactivity cannot be achieved with HTML forms and that a Java applet was more suitable. The result was that the interface was divided into two distinct parts: the first part derived from the Europagate system and the second part the Java applet created within the Project. These two parts remain fairly distinct, and although it is clear how to proceed from one to the other the interface does not seem to "flow" smoothly.

Since BOPAC2 is not specifically about Z39.50 as such, it seemed logical to re-use as much of the existing Z39.50 resources if possible, even if the result was apparently disjointed. Europagate provided all the required Z39.50 communications subsystem plus the search interface. This left the Project free to concentrate its limited time on the core area: the processing and display of the results.

Having made the decision to use Europagate, and to use as much of it as possible, the overall structure became clear. It would start as a standard WWW-to-Z39.50 gateway installed and running in the Department of Computing at the University of Bradford. It would communicate with users via the Web and a Java applet, and with Z39.50 targets via Z39.50.

3.5. Availability and Capability of Z39.50 Targets

At the beginning of the Project there were hardly any Z39.50 targets available in the UK, although there were many in the US. The Project used UK targets as much as possible because of the much faster and more reliable response through JANET. As anticipated, more targets came on stream as the Project progressed, but at the time of writing there are still only 6 or 7 targets available. Others such as eXplore Z39.50 from SLS [33] are available on subscription, but these were not used in this Project. It is hoped that other development projects will accelerate the installation of more Z39.50 targets over the next 12 months.

The Europagate software, and most of the targets used operated according to Z39.50-1992 rather than Z39.50-1995. Some facilities of Z39.50-1995 were available on some targets but since the Project was concerned with searches across several targets, and since the Europagate software offers few of the new features from the 1995 standard, Z39.50-1992 was the lowest common denominator. Whilst some targets in the USA (e.g. the Library of Congress) offer a spectacular range of access points, others are more limited. The Z39.50 target at Bradford University, for example, can search only personal authors, not corporate authors or conferences.

3.5.1. Z39.50 Targets Accessible through BOPAC2

The list of targets accessible via BOPAC2 included all known, working systems in the UK and Ireland, the two "famous" US catalogues, and one smaller US university. As more UK targets appeared these were added to the list. By the end of the Project the list of targets was as follows: At the start of the Project it was difficult to find details of any Z39.50 targets outside the Anglophone countries. Although a number of EU-funded projects were taking place they did not appear to have got as far as delivering working Z39.50 targets. The only targets available were the union catalogues BibSys (Norwegian academic libraries) and DanBib (Danish public and academic libraries).

3.5.2. Interoperability between Targets and range of Access Points

The targets varied in terms of the access points that they provided to their underlying catalogue, and in terms of their adherence to the Z39.50 standard and recommendations (such as Bib-1). Both of these variations hamper interoperability and prevent effective distributed searching, as well as making life difficult for users. It was felt that BOPAC2 should present (as far as possible) a standard search interface across the different targets.

The Europagate software, when working across multiple targets, chooses one of them and presents the user with its access points (Bib1 Use attributes). The hope is that these access points can be applied to the other targets in the search, but of course this is not always the case. Faced with a search across two targets, X and Y, the Europagate software may mask out access points from target X if they are not supported by target Y. Or it may offer access points from target X even if they are not supported by target Y. This is a pragmatic if haphazard compromise; hopefully the Explain facility (which was introduced in Z39.50 version 3) will improve matters in the future.

Europagate can map the same access point name (e.g. "personal author") onto different Bib-1 attributes combinations on different targets. This is a useful facility which makes it possible to present the user with standard names for access points across targets which implement those access points using different combinations of Bib-1 attributes. Thus it was possible to present a fairly standard set of easily-understood named access points across all the targets; these are discussed in detail below.

Among the UK targets in particular, many had only just been installed when they were added in to BOPAC2 and it was necessary to rely on software vendor's information about their systems which was not always accurate. In these cases the only way to find out precisely how a particular target responded was by experiment, but there was not enough time to conduct exhaustive investigations of every target.

3.5.2.1. Title search

All the targets included in BOPAC2 offered title search via Use attribute 4. Most offered Structure attribute 2 (word) and 6 (word-list) for keyword in title search. If both word and word list were available on a target, word list was used for the "Title Contains" access point. Some targets also treated word and word-list searches identically. Other attribute types such as relation, position, etc. were not set explicitly unless they appeared to be necessary. Some targets seemed to return an error automatically if any search attempted to specify attributes other than Use and Structure.

One or two targets offered more specific title searches such as uniform title or collective title but these were not usually included for the sake of simplicity; in the BOPAC2 model such precise distinctions can be added in after the retrieval phase. In the case of the British Library Document Supply Centre, Series Title was included as a separate access point for precise known-item searches.

3.5.2.2. Author search

Whilst title searching was relatively uniform across the targets, author search was more of a problem. Bib-1 defines a number of Use attributes for author of which the most popular is 1003 (Author) which (in theory) includes personal and corporate author as well as conferences. However a number of targets did not offer Use attribute 1003. Some offered instead separate access Use attributes 1004 (Author name personal), 1005 (Author name corporate) and 1006 (Author name conference). Others offered Use attributes 1 (Personal name), 2 (Corporate name) and 3 (Conference name). Bib-1 specifies that "Personal name" includes subject tags whereas "Author name personal" does not, but most targets seemed to ignore this distinction and in practice Use attributes 1004-1006 could be considered equivalent to attributes 1-3. Several targets such as the one at Bradford University library offered only personal name author search. One target, given a combined author & title search, seemed to ignore the author term and to use only the title. Some targets came to the attention of the Project team with no form of author search at all; but these were not included in the targets available via BOPAC2.

3.5.2.3. Subject search

The Project was concerned primarily with author/title retrievals, but many of the targets offered Use attribute 21 (Subject) and this was included as an access point. If possible, precise match (Structure attribute 1) and keyword in subject heading (Structure attribute 2 or 6) approaches were offered. Some targets did not support the use of the Structure attribute with a subject search; in these cases it was assumed that subject search was phrasal.

3.5.2.4. ISBN, ISSN and other access points

Most targets offered ISBN search and this was included in BOPAC2. By and large, ISBN searches worked with few problems uniformly across all the targets, which is to be expected. Some targets offered ISSN and if available this was included in BOPAC2. The Library of Congress offered a vary wide range of access points and a few of these were included in BOPAC2 for experimentation.

3.5.2.5. Explain Facility

Before the Explain facility was introduced in Z39.50-1995 there were two ways in which the origin would decide which access points to offer the user to formulate their query. Some systems, particularly standalone products such as BookWhere [28], offer the user the same access points irrespective of the target. The obvious problem with this is that the user probably has no idea which access points are supported by the target, and the only way to find out is by trial and error.

BOPAC2's approach incorporates knowledge about the targets' search semantics into the origin to generate appropriate access points. This is a function of the Europagate software, and the Europagate gateway itself [31] works the same way. Other systems such as the Library of Congress Z39.50-WWW gateway [35], use a similarly adaptive interface. This is obviously a better approach but it involves extra administration in updating the origin every time the targets add new access points.

The Explain facility solves the problem by forcing the origin to interrogate the target to find out what attributes (and thus access points) are available. Explain is a version 3 facility and was not used in BOPAC2. Explain encapsulates the semantics of the target, and different targets will have different semantics. Under Z39.50 version 3, therefore, the origin should interrogate the relevant targets using Explain, identify the set of available the attributes on each target, and then compute the intersection of those sets so as to present the user with suitable access points which will be meaningful across all the targets (or at least most of them).

3.5.2.6. Target search behaviour

During the development and testing of BOPAC2 with different targets it was found that they often interpreted the same combination of Bib-1 attributes in different ways. This is not surprising because the Bib-1 semantics document [36] only suggests, but does not prescribe, how the attributes should be interpreted by the target. Often the behaviour of the target in response to particular combination of attributes seemed to depend on the capability of the underlying database engine, but there was insufficient time to go into this in detail. Some targets, for example, ignore the Structure attribute and always execute a keyword search even if an exact phrase match was requested. Others seem to fall back to less precise criteria if the specified criteria produce no hits so that if, for example, a phrasal search produced no hits they execute a keyword search instead. Whilst this might seem sensible, the Z39.50 model suggests that it is the origin, rather than the target, that should initiate a new query in this way.

The approach taken in the Project, which was less dependent than other Z39.50 origins on the Z39.50 search mechanism itself, was more robust and better at smoothing out the differences between different targets. The user is shielded from the complexity of Z39.50's search and retrieve mechanism. The result is a friendlier and more accessible distributed search interface.

3.6. Java Applet Design and Development

Having decided to use a Java applet it was necessary to decide how to develop the Java software. It was established that this was likely to be a complex applet and that a sophisticated Java Integrated Development Environment would be required. Symantec Café was chosen following several recommendations. It proved a useful and highly productive development tool.

3.6.1. Appearance and Sequencing

The starting point for the design of the Java applet was the Project's predecessor, BOPAC 1. BOPAC 1 used a sequence of overlapping windows, and whilst this approach would have been possible in Java, it would have resulted in a screen full of windows outside the boundary of the web browser. For clarity, and to follow the general approach taken by other Java applets at the time, it was decided to create the interface within a predefined area of the web browser.

Nevertheless there were lessons to be learned from the design of the BOPAC1 interface sequence and other GUI OPAC's. Hildreth [37] suggests three facilities that would be useful in a GUI OPAC:

  1. Character-based helpful prompts displayed on the screen
  2. Flexible movement among various levels of a search as they are presented in multiple windows.
  3. The ability to gather in works related to a work on display, or works linked to a displayed heading or call number.
These suggestions are described in the context of a conventional OPAC rather than a Z39.50 origin and hence apply to the search interface as well. In this Project searching is carried out by the Europagate software and little could be done to modify its interface. BOPAC2's interface was designed in some ways to resemble the "screen-by-screen" approach of traditional OPAC's but with "Back" and "Forward" navigation buttons to enable the user to retrace their steps if necessary; the whole model designed to resemble a web browser. The third of these suggestions is the most relevant, since it is one of the core principles of the whole Project.

3.6.2. Working Dataset

Having established how the software would look, the next stage was to decide how it would obtain and process its working dataset, the combined set of Z39.50 retrievals created by the Europagate gateway. This dataset is made up of records, each record consisting of two components: These records would be stored on the web server at Bradford by the modified Europagate software. The Java applet would download this file and parse them into a data structure on the workstation. Because Java applets cannot store data persistently on the user's workstation it would be necessary to hold all the data in memory.

This meant that the workstation's memory capacity would limit the size of the retrieval set which could be processed. Network delays would also cause problems for users in downloading large retrieval sets; but it was impossible to predict how serious these problems were going to be.

3.6.3. Multiple MARC Formats

One of the problems with many existing Z39.50 origins is that they can only deal with one MARC format (typically USMARC). However many UK-based targets (including Bradford University) deliver UKMARC records. It was clear that BOPAC2 retrievals could potentially contain a mixture of (at least) USMARC and UKMARC records, and perhaps other MARC flavours as well. The problem was how to handle heterogeneous sets of MARC records. Other Europe-wide projects have got around this problem by mapping the various flavours of MARC onto a single standard MARC format. This mapping is applied before the results are displayed.

BOPAC2 took a more object-oriented approach by defining a series of classes. First, the base class BibliographicRecord was defined. The generic MARC class was a subclass of BibliographicRecord, and inherited from MARC were the specific classes UKMARC and USMARC. MARC tags were represented by a name (e.g. "main entry title") rather than a tag number ("245a") so that they were not tied to any one particular MARC format. Other MARC formats can be incorporated as subclasses of the generic MARC class. Other bibliographic formats (not based on MARC) can be incorporated as subclasses of BibliographicRecord. Whether this approach is better or worse than simple format conversion is not clear, but it does demonstrate another way of tackling the problem which could become popular as more and more projects use object-oriented software development.

3.6.4. The BOPAC2 Object Model

Whilst it is not the purpose of this report to describe the whole design of the system in detail, it is worth identifying some of the system's most important classes and their relationships. This should give the reader some idea of the architecture.

The BOPAC2 interface consists of the combined retrieval set (combined from the various Z39.50 targets) and of views of that set. Each view represents a subset of the records in the combined retrieval set displayed in a specific format. Each time the user interacts with the system, a new view is generated. The new view contains a subset of the records of the current view. So for example the first view the user sees contains the whole combined retrieval set and may consist of several works. If the user chooses one of these works, a new view is generated containing the subset of records which pertain to that work, displayed in full or partial record format.

There are many different classes for the different views; all are ultimately derived from a single base class (SetView). Another class (ClusterView) acts as a base class for viewing clusters. Specific subclasses (e.g. AuthorClusterView, PublisherClusterView) are then derived from these.

The interface contains a linked list of views. As the user works with the system they generate new views which are added to this list. The "back" and "forward" buttons move the user back and forth within the list. The interface is stateful and each view records a snapshot of the state. As the user retraces their steps back through previous views, the previous states are restored.

There is a class for the rather abstract concept of a cluster. A cluster object consists of a label (which names the cluster) and a set of contained records. This is used in different ways throughout the system. A cluster might for example be labelled with an author name and may contain records of items written by that author. Another cluster may be labelled with the date range (e.g. 1900-1990) and contain records with publication dates in that range.

Fig 1.

Figure 1 Some typical routes through the BOPAC2 interface

3.6.5. Java Language Issues

Java is a very new language, and at the time the Project was proposed it had only just appeared on the scene. There were a lot of unknowns, particularly with respect to performance. How big would the code be? How much memory would it require, how fast would it run? Would it really work across platforms, and how much would performance vary? As the Project proceeded, the answers to these questions became clearer.

Java's cross-platform promise turned out to be rather wide of the mark. Sun Microsystems (the inventors of Java) promise "write once, run anywhere" but the reality was more a case of "write once, debug everywhere". It seemed that each time the system was tested on a new platform, a new web browser, or a new version of a web browser, it had to be modified. The problems were nearly always to do with the display and the AWT peer classes. Both Netscape and Internet Explorer suffer from bugs in their early implementations of the AWT peer classes. Internet Explorer, for example, could not handle scroll bars correctly. There were also problems with X-Windows in monochrome and differences between X-Windows and Windows 95/NT.

Symantec Café was purchased with the original Java Development Kit (JDK 1.0.2). JDK 1.0.2's Abstract Windowing Toolkit (AWT) was notoriously weak, and substantial effort was required to make it display text in a useful way. During the development phase a new JDK (JDK 1.1) was released by Sun Microsystems, with an improved AWT. However it took some time for Symantec to incorporate the new JDK into Café, and the older Java-capable web browsers (e.g. Netscape version 3) can only handle JDK 1.0.2. Later browsers can also handle some parts of the new JDK but it was decided, in the interests of accessibility, to stick to JDK 1.0.2 as the lowest common denominator. The only facility from JDK 1.1. that was used was JAR archiving. Some web browsers can use this feature to cache the Java applet on the workstation, so improving performance.

3.7. HCI issues

In the BOPAC1 project it was possible to have a clear interface design aim for the system since that was to run on a PC under Windows. The aim was to make the system behave, as much as possible, in a standard Windows fashion so that its behaviour would, hopefully, be intuitive in many regards to any user who was familiar with other Windows based applications. With BOPAC2's Web based interface there was no clear corresponding paradigm to fit into. The WWW has opened up new areas for HCI work and whilst some general lessons, such as limiting the number of user options can be transferred from previous HCI work there are other considerations that are more Web specific. This is even more the case with Java based Web applications.

Searching for the keywords "Java" and "applet" in the online proceedings of ACM CHI 96 and 97, one of the premier conferences on human computer interaction gives an indication of much, or little, work has been done on HCI aspects of Java based systems. In the 96 proceedings searching for applets produced no hits and searching for Java, only one. That article in fact only discussed Java as a future option saying that had it been available earlier it would have been a way round some HTML limitations. By contrast searching the 97 proceedings produces 4 hits for applets and 6 for Java.

From the 97 proceedings it is clear that some general lessons for Java development are starting to emerge, many of these are related to general WWW issues. In general designers must be conscious that they "don't know the capabilities or configuration of the applet user's machine " [38], this parallels the general situation with the WWW where you don't know the user's browser's capabilities i.e whether they have images on, what colours they are using and what screen size. Another important factor in Web applications in general and one that is being studied by HCI practitioners is that the time for responses and interactions is often not under control (or the "World Wide Wait" issue as it is often known).

Although there are a number of tools such as validators and "linters" for web pages similar tools for applets are still developing. The Project tackled these issues by testing the system on as large a number of browsers (including different versions of the same browser) and operating system as possible. This highlighted problems with the Java AWT toolkit in some situations and led us to make sure that features such as the use of colour were duplicated by the use of different fonts so that a similar effect would be seen on colour and monochrome screens. This was important to us if we were to realise the cross platform potential of Java, so we did not want to specify any system constraints, such as particular browser or screen size, other than Java support. This strategy would seem to have been successful since one platform that we did not have facilities to test on was the Macintosh but one questionnaire response commented that he was pleased to see it work first time on a Mac.

The policy undertaken by the Project was to use a fairly simple single Java applet of a modest screen size and concentrate the controls at the top of the applet. This meant that most users would see all the controls on start up and have a standard view of the application. One area of interface design that we were conscious of, and was commented on by users, was the lack of consistency between the Java applet and the HTML pages derived from the Europagate software that controlled the searching part of the system. The Europagate software has a standard "HTML forms and CGI" look and will look familiar to users who have used other non Java based Web interfaces to databases. Rewriting this part of the system was beyond the terms of reference of this project, but would be an obvious step forward in creating a complete easy to use system with a coherent "look and feel".

3.7.1. Optimisation

Initial prototypes worked well but turned out to be frustratingly slow on some workstations. Experiments showed that this was because the garbage collector was taking up most of the CPU time trying to free up memory. Java was eating memory in spades. The problem was that identical copies of strings and other objects were being created over and over again within the Java Virtual Machine. The code was optimised to eliminate this as far as possible, but because Java does not have explicit memory-allocation primitives, it was impossible to prevent unnecessary duplication of data structures completely. Fortunately most 32-bit Windows workstations now have enough RAM to cope with the applet without running out of memory, unless the retrieval set is extremely large (around 1000 records or more).

3.8. Testing and Refinement

The intention of the Project as proposed was to combine the evaluation of the software with a refinement phase, so that end-user responses and comments could be used to refine the interface during the latter part of the software development cycle. Due to circumstances beyond the Project's control (mainly the delay in configuring the Bradford Library Z39.50 target) it was not possible to run this as originally intended.

Once the first prototype version had been largely debugged, a list of potential evaluators was drawn up to test the system.

4. Large and Complex Sets in Z39.50

This section describes how the Z39.50 Information Retrieval protocol handles large result sets and result sets in which many of the records are related or "belong together". It goes on to show how the clustered form of presentation demonstrated in BOPAC2, which at the moment takes place outside the realm of Z39.50, can be modelled directly into Z39.50 itself. It is intended primarily for those who are interested in the details of Z39.50 - other readers may prefer to move on to the next section.

Z39.50 has a number of facilities for reducing a large result set. Most of these are concerned with displaying the results in brief or reduced form so that the user can select which records to retrieve in full. None of these facilities, however, can successfully fulfil the second objective of the catalogue and the main reason for this is that the model of the result set, as a group of unrelated item, does not reflect the real situation in which many of the retrieved records will be related in some way. Were these relationships to be modelled in the result set, they would provide the best mechanism for selecting relevant records to retrieve. This section describes how such relationships can be represented in the Z39.50 result set as they are in the BOPAC2 retrieval, using a cluster model.

In cases where the origin searches just one target, the use of such clusters would be more efficient than the traditional brief record/full record approach suggested by Z39.50. It would also lead to a more useful and informative user interface at the origin. In the case of a distributed search, a "meta-target" model is suggested as a way of clustering retrievals from distinct, remote catalogues.

4.1. The Networked Environment

In the networked environment large result sets present more than just a problem of presentation. The limited bandwidth of the network means that there is a limit on the amount of material that can be transmitted from the remote catalogue to the end user within a reasonable time. When using the Internet environment, transmission rates vary all the time and are impossible to predict accurately, and the delays over HTTP have as much to do with the number of files as with the size of each file. Sometimes, one large file can be transmitted faster than several small files, sometimes it is the reverse, sometimes it makes no difference. It depends on the nature of the connection, the prevailing traffic, local or national caching, and other factors beyond the control of the user.

Currently the Z39.50-1995 standard has emerged as the standard for interoperability between library catalogues. Whilst Z39.50 is by no means limited to bibliographic applications, such applications form a major part of current development work. The following discussion will focus on Z39.50 purely in the bibliographic domain, from the point of view of librarians and (most importantly) remote catalogue users. The aspects of Z39.50 discussed here do not necessarily apply to the whole Z39.50 standard in general.

Z39.50-1995 is "within the application layer of the OSI network model" [39]. As such, pedantically speaking, it is not really concerned with physical networking issues such as bandwidth and transmission rates. However, as with many similar protocols in the application layer (FTP, HTTP) some aspects of the standard have clearly been developed with a view to the unpredictable and limited bandwidth of underlying network layer, which in practice is nearly always TCP/IP. In Z39.50 terms, bandwidth affects the quantity of APDU's transmitted between origin and target, and the size of individual APDU's.

4.2. The Z39.50 Approach

From the beginning (version 1), the designers of Z39.50 were aware of the problem of large result sets. Their approach has been to try to minimise the amount of material retrieved from the remote database, reasoning that retrieval is likely to generate more material than anything else. So from the start Z39.50 separated the Search and Retrieval facilities. The origin can interrogate the target's databases and conduct a search without actually retrieving anything. Because the result set is retained at the target, and is keyed on position for random access, the origin can retrieve only the records (and possibly only the parts of those records) that it needs. In theory, there is no need to transmit redundant material and no wasted network traffic or unnecessary load on the Z39.50 server resources.

Z39.50 does not specifically address the question of how the origin decides what to retrieve from the target. Usually, the origin consults the user who makes a decision based on information supplied by the target as part of previous operations. This is a crucial step in the whole search and retrieve cycle. The user is interacting with the process, deciding what to retrieve, and he/she must have enough information to make this decision. So how does Z39.50 help?

4.3. Changing the Query

One of the most common ways to reduce the size of the result set, and to improve its precision, is simply to rewrite the query. This is common in conventional OPAC's. In terms Z39.50 there are several ways in which a query can be modified: The Z39.50 model encourages this approach. The SearchResponse tells the origin how many records are in the result set, so that it can attempt to reduce it before initiating a Present operation. Some Z39.50 targets support a wide range of "minor" access points (such as publication date, publisher, language, and edition) which may be used in conjunction with title or author.

The major disadvantage of this approach is that the user is asked to reformulate their search, perhaps add terms to the search, without knowing anything about the large result set they have generated. For example, how can the user be expected to filter the results by publication date when he/she may have no idea of the range of dates in the result set? This may explain why, on conventional OPAC's, few people bother to reformulate their query to reduce the number of hits [18]. In the Z39.50 environment the user is potentially working with an unfamiliar catalogue, which may or may not have the capability to filter the result set, and he/she is even less likely to attempt to reduce the result set by reformulating the query.

4.4. Choosing Specific Records with Z39.50

There are several aspects of the Present service which, potentially, may assist users in deciding which records the examine in detail. Among these are the Summary Record Syntax and the Generic Record Syntax which allows the origin to address and retrieve specific elements within a record Few if any Z39.50 targets support these, but all must (according to the standard) support "Brief" element sets. Brief record elements are not defined, but are one of the most common mechanisms for providing the user with a summary of the content of the result set. Other facilities include the Simple Unstructured Text (SUTRS) record syntax (widely implemented), and the Sort facility for sorting the result set (not widely implemented).

4.4.1. Brief Records

Typically, at the moment, a Z39.50 origin will issue a SearchRequest on a target, issue a PresentRequest to retrieve brief records for the first "chunk" of the result set, and then display these in some way to the user. The user decides by inspection which records they wish to see in full. The origin then issues another PresentRequest to retrieve these as Full records. The origin may then go on to retrieve more chunks of the result set, again interacting with the user to decide which of these records to retrieve in full.

4.4.1.1. Brief Record contents

The user must make use of the information in the Brief records to decide whether or not to retrieve a Full record. The crucial question is whether Brief records provide the user with enough information to make this decision. The answer, unfortunately, is far from clear. Since the precise content of "Brief" records are not defined within the Z39.50 standard, they can (potentially) vary from one other target to another (and particularly from one library system vendor to another).

Targets that handle MARC records mostly seem to include the same core elements (author, title, and publication details) in their Brief records. Those that use unstructured records such as SUTRS (see later) are less predictable and their Brief records are unstructured. Potentially, these variations can cause problems when conducting a search across several targets. The Z39.50 ATS1 profile [40] could have pinned down the content of the Brief element set, but it didn't.

The content of Brief records is determined purely syntactically. They are simply a subset of the elements of the "Full" record. If there is vital information outside the scope of the Brief element set, the user will not see it. This is the cause of a common problem of most Z39.50 systems, where the target hits items which do not appear to match the user's original query. Brief records may not necessarily contain the elements which explain to the user why the record was hit. Examples of this are shown in Appendix 1.

4.4.1.2. Repetition in Brief Records

Because Brief records are derived from the a given set of elements, one record will often repeat data in another record, particularly when the MARC tags mapped by the Use attribute overlap with those in the Brief element set. This happens, for example, in author/title search.

Appendix 1 again shows examples of such repetition. The problem is particularly bad where the search is for famous works which have been published in several different editions. It is a well-known problem on conventional OPAC's [14]. The extent of the repetition will depend on the scope and content of the database, the type of search, the search terms, and other factors.

4.4.2. SUTRS Records

Brief records arise from an element specification which must, according to the standard, be supported by all targets. The SUTRS record syntax is supported by some (not all) Z39.50 origins and targets, including for example the COPAC Z39.50 target [41]. SUTRS is essentially a display format: it allows the target to control the way in which records are displayed to the end user. Again, their format is not defined in the standard.

SUTRS is a Full record format, so the records generally contain more information than Brief records. Yet SUTRS records are smaller and less detailed than MARC records, so they can be transmitted more rapidly over a limited bandwidth network. Potentially, SUTRS records could be regarded (from the user's point of view) as a compact compromise between Brief records and MARC records. In addition, a clever target could ensure that each record contains one element which explains why the record was retrieved.

Unfortunately, few if any origins or targets at the moment work in this way. SUTRS records are typically used as a substitute for MARC records rather than a step towards them. They usually resemble the back-end catalogue's full record display, loosely modelled on the ISBD. The format of full record displays varies quite considerably among conventional OPAC's [42]. Further research is needed to determine how SUTRS records vary across Z39.50 targets, but the main point of SUTRS is that the format is left up to the target. It is unlikely that the format of SUTRS will ever be standardised sufficiently that it can be used for distributed searching.

SUTRS is a record syntax whereas Brief is an element set. Thus it is possible to have Brief records derived from SUTRS. SUTRS-based targets can return brief records, but the indications are than this makes the problems even worse (see the COPAC example in Appendix 1.

4.4.3. Summary Records

Z39.50-1995 defines a summary record syntax which contains the essential bibliographic details such as author, title, publication details, etc. This was developed in order to harmonise with WAIS, to provide brief record summaries in a standardised format. Unfortunately, at the time of writing, no Z39.50 targets or origins appear to use the Summary syntax.

There is considerable semantic overlap between the elements of the Summary syntax and the tags of a MARC record. It seems almost certain that if any bibliographic targets were to use Summary syntax, the records would be machine-generated by selecting specific tags from corresponding MARC records.

4.4.4. Element Specifications

Z39.50 - 1995 includes the facility to extract specific elements from records in the result set via an elementSpec. Brief and Full records are specific instances of named element sets which were part of version 2 of Z39.50. Z39.50-1995 also permits dynamic elementSpec's defined within the eSpec-1 element specification format. With this, the origin can select specific elements or parts of elements from the database records.

This is a very flexible and powerful facility which enables the origin, rather than the target, to decide which elements will be included in the records returned by the PresentRequest. It gives the origin more control over the content of the records and enables a standardised retrieval pattern across different targets.

However, the origin is still working "blind" because the contents of the elements cannot be determined precisely; they depend ultimately on the cataloguing. Element specifications do not solve the problem of data repetition either, because the origin has to download the elements before it can tell that they contain repeated data. Currently few if any targets or origins implement eSpec1.

4.4.5. Origin-generated short title list

One of the core ideas of Z39.50 when it first appeared was that it was a true machine-to-machine protocol, rather than a display protocol like HTTP. The great advantage of this approach is that users can use the same application and the same interface to search different targets. They can also examine the results from different targets in the same format. Although Z39.50 does not explicitly mention simultaneous searching of multiple targets (distributed searching), it is an idea which has attracted considerable support in the Z39.50 community [43]. Such distributed searching depends heavily on standardisation of the display format across different targets.

The introduction of an increasing number of generic syntax-oriented facilities to the standard has to some extent diluted the original idea of a pure machine-to-machine protocol. Nevertheless, many origins such as BookWhere [28] and Willow [24] attempt to summarise the results for the user by retrieving a full MARC record and extracting from it appropriate tags to generate something like the conventional OPAC's short title display. Other systems such as ZNavigator [29] and the Library of Congress WWW-to-Z39.50 gateway [35] attempt the same operation using Brief records. This is more risky (because the content of Brief records is not defined within the Z39.50 standard) but it saves network bandwidth and hence improves response time. Fretwell Downing's OLIB system also post-processes the retrieval [44].

In generating the short title display most origins carry out the same familiar transformation. They simply extract a subset of tags from the MARC record. As with Brief and SUTRS formats, this subset of tags may or may not explain why the record was hit. Creating "short titles" at the origin, rather than the target, can deliver a prettier display format to the end user, but in terns of content, there is the same potential for confusion over spurious records.

4.4.6. Sort

There is no default sort sequence for the result set, but the origin can ask the target to sort the result set before initiating a Present operation. Typically the target might be able to sort by a Use attribute such as author or publication date. The obvious advantage of this is that the origin may be able to collocate related records at least to some degree, and so improve the readability of the results. Sort does, however, suffers from some disadvantages:

Firstly the keys are not clearly defined. The Sort operation may involve a set of attributes, an element specification, or some other target-specific aliases or database-specific keys.

Secondly, there are loopholes. The sort key may contain multiple Use attributes, in which case the target must decide which one to use. Even if there is only one Use attribute the situation is not clear. For example, bib-1 maps Use attribute 1003 (Author) onto 12 different MARC tags. From these tags, and potentially from repeating subtags within the tags, the target extracts (presumably) one single sort key. The origin cannot predict with any certainty how the target will make this selection. Perhaps the target will choose corporate author in preference to personal author, or perhaps the other way round. This is less of a problem with element specifications, where there is an implied priority, but there is still the question of repeating values. There is no profile detailing how Sort should be used in the bibliographic context, and currently few targets provide the Sort facility.

Conventional OPAC's usually make some attempt at collocation, or at least provide a facility to collocate the output, although the results are sometimes far from satisfactory [14]. If the result set is large and unordered (or badly ordered) the user is unlikely to plough on right to the end of it. He/she will probably give up before the end, satisfied with perhaps one or two relevant hits, and may never see other related records further on in the result set.

Currently most Z39.50 origins and targets produce apparently unordered retrievals which exhibit no collocation at all. The Sort facility should help to improve matters, but it is ambiguous and is unlikely to collocate the results any better than a conventional OPAC. In addition, in the Z39.50 environment, the origin typically retrieves records from the target in groups of 20-30 records at a time. If bandwidth is limited, the user has to wait each time to retrieve the next group of records. If the number of records is large, he/she is even more likely to give up early, and may miss relevant records towards the end of the result set because they take too long to download.

Version 4 of Z39.50 will include the new Type 102 Ranked List query which ranks the result set according to a relevancy score, so that the most relevant records will be those at the start of the result set. Such an approach may work for subject-based searching, but is not so applicable to known item author/title searches which are by far the most common type on many OPAC's (see for example Appendix 5).

4.4.7. Conclusion

There are various ways in which the Z39.50 result set can be presented to the user in abbreviated form: Fundamentally, all of these techniques work in the same way, by extracting a fixed, pre-defined subset of tags from the MARC record. This approach has several potential disadvantages. Firstly, it can lead a retrieval or display of a set of almost identical records. Secondly, it can produce apparently spurious hits, with no explanation for their presence in the result set. These problems are specific instances of more general problems of precision and collocation which have been well documented on conventional OPAC's. It is clear that in the Z39.50 distributed searching model, all these problems persist.

4.4.8. Using Z39.50 for Distributed Searching

Some research has been carried out into the user's reaction to searching with Z39.50 [45], but there is very little if any research into users's reaction to a distributed library and distributed searching. This is probably because many of the distributed searching projects exist only in theory at the moment and have not yet been implemented. Currently available working Z39.50 targets, developed by the major library system vendors, often suffer from limited functionality and do not appear to inter-operate or to implement Bib-1 or ATS-1 very well.

At best, the user, searching remote networked catalogues with Z39.50 will face the same problems as he/she would face searching his or her own local catalogue. More likely is that the display deficiencies will be exacerbated by the user's lack of familiarity with the remote catalogues (although not nearly as much as if he/she were searching the catalogue through its own native interface). At worst, however, the user will have to cope with substantial variation (in terms of search formulation and record formatting) between Z39.50 targets.

It could be argued that a more tightly-defined profile (or full adherence to the profile) will improve inter-operability and enable distributed library projects such as MODELS [46] to come to full fruition. However, Z39.50 seems to be moving in the direction of generalisation rather than specialisation. Lynch, in his paper on the future of Z39.50 [47], points out that it is difficult to reconcile the future (apparent) direction for Z39.50, with the inter-operability and shared semantics required for effective distributed searching. In this Project, Z39.50 is used as a semantically rich protocol.

4.5. Other Facilities for Reducing the Quantity of Material Transmitted

Various other aspects of Z39.50 - 1995 enable both target and origin to control the amount of material transmitted and so adjust to limitations of bandwidth. These include a number of low-level technical facilities such as:

4.6. Modelling the BOPAC2 approach in Z39.50

Z39.50 does not offer adequate facilities to end-users trying to select relevant records from their retrieval. BOPAC2 models the relationships between records as clusters. In the basic default model of the Z39.50 result set there is no concept of a relationship between records. Because of this, clustering has to take place within the BOPAC2 interface and outside the realm of Z39.50. But it is possible to represent such relationships and clusters within the standard Z39.50 model. Clusters can be modelled in Z39.50 with the GRS-1 syntax, and could be generated at the target rather than the origin using the same algorithm. This would be significantly more efficient. In fact this approach would be more efficient than Brief records, element specifications, or any of the existing facilities in Z39.50 for reducing or summarising the results.

4.6.1. Representation of a Cluster in Z39.50

Modelling clusters in Z39.50 opens up new possibilities in other non-bibliographic application areas. Any Z39.50-accessible database in which records can be formed into groups might benefit from this model. Examples might be a series of web pages linked to a home page, or a set of associated documents. In fact records of different datatypes could be contained within the same cluster.

Z39.50-1995 [39] defines both the GRS-1 syntax and TagSet-M (the meta-data tagSet - [48]. TagSet-M includes an element Record which is designed "to present nested or subordinate records". Each subordinate record may have a schema specified (with the tagSet-M schemaIdentifier) which may be independent of the schema of the record it is contained in, and independent of the other subordinate records. Subordinate records may themselves contain their own subordinate records. Thus, with TagSet-M and GRS-1, it is theoretically possible to model any hierarchical heterogeneous relationships within the target datasets.

The work clusters in BOPAC2 can be seen a special case of this general model. Currently they are based on MARC records, which have a well-defined schema which is shared between target and origin. However the elements used in a work cluster could be described by tags in TagSet-G (the general tagSet)., which separates it from any one particular schema. [49] suggests that TagSet-G elements should have generic semantics, which can be made more specific within a particular schema. Thus work clusters could be either generic or specific to one schema.

The work clusters might be modelled in four parts as follows:

  1. An author/title part which could use a combination which identifies the work (TagSet-G elements Title and Author).
  2. A descriptor part which includes the elements which caused the records to be hit (if available) or a description of why the records were hit (TagSet-G element bodyOfDisplay could be used for this).
  3. The number of records in the cluster (an integer).
  4. The contained records (TagSet-M element Record).
The author/title, descriptor and the number of records have been listed because these were the elements created within BOPAC2. Perhaps a more general model would be better. A cluster label, a structured element whose actual content would be determined by the target and could be retrieved by the origin. Such a structure could be used to model clusters around related subject headings, dates, editions, or any other information.

GRS-1 makes provision for any element to have a hit vector which can be attached to the author/title or the descriptor parts to show where these elements were hit. This could be used, for example, to highlight the position of keywords within the matching title or statement of responsibility.

How would this work in practice? Typically an origin would issue a SearchRequest. The target would either (a) search a database of cluster records, or, (b) search the underlying bibliographic database and generate cluster records on the fly. The origin would initiate a Present operation to retrieve the cluster records which would be displayed to the user. This would look similar to the first screen in BOPAC2. The user would then select appropriate clusters. The origin would then retrieve the records contained in the selected clusters and display them. Depending on what these records were, there could be further dialogue with the user, and more PresentRequests to retrieve other contained records. The user might potentially end up with a varied set containing record clusters, bibliographic or meta-data records, complete documents, abstracts/tables of contents, or other kinds of data.

Yet the BOPAC2 cluster model enables these disparate types of data to be presented within an easily-navigable, hierarchical display model. Navigating the BOPAC2 cluster model can be thought of as "opening up" the cluster "to see what's inside". It is an iterative process: successively narrowing scope and increasing detail. The target decides the level of detail that it supplies to the origin; the origin decides how much to show the user.

It must be stressed that the above is not intended to be an exhaustive or optimal proposal. It simply demonstrates that the clustering approach can be modelled generically and fairly easily in Z39.50. A origin and target capable of processing cluster records could engage in a dialogue with the user which would resemble that of BOPAC2, but would be more efficient than the existing Search/Present cycle. The elimination of repetition and redundancy in the retrieval has two benefits. It reduces the total amount of data transmitted, and hence makes better use of limited bandwidth. And it reduces the amount of extraneous material for the user to look through.

4.6.1.1. Duplicate handling in Z39.50: A recent proposal

It is worth pointing out that a proposal similar to the above concept has just recently (at the time of writing) been proposed to the Z39.50 Implementor's Group [23] as a means of handling duplication. It is proposed that duplicate records are clustered together by the target and that a "representative record" is generated which covers the duplicates. This is still a proposal so it is likely to undergo many revisions before it becomes part of the standard. As it was discussed on the Z39.50 discussion list, one contributor spoke of the "need to impose structure" on the result set in order to be able to cluster the retrievals in some way. This is essentially what has been proposed in the preceding paragraphs; duplicates can be regarded as a special case of the multiple manifestations of a work [50]. Duplicate records could be modelled as a cluster with the label containing the representative record and any associated information.

Since there is currently no way to handle duplication in Z39.50, BOPAC2 removes duplicates in the retrieval at the origin. The proposed Z39.50-based solution appears to have some disadvantages in that it is difficult to see how clusters of duplicates could be assembled in a distributed search environment. In fact the same argument applies to the general clustering model described here (see below). The other problem with duplicate handling at the target is that although the origin will be able to have some influence on how duplicates are detected, the actual detection algorithms will not form part of the standard and are likely to differ from one target to another; another pothole on the road towards effective interoperability.

4.6.1.2. Distributed Searching and the "meta-target"

Z39.50 does not explicitly include distributed searching within the scope of the standard, and nothing in the standard has been included specifically for distributed searching purposes. Nevertheless this is a popular application of the standard.

At first sight the above proposal might appear awkward in a distributed search context where the origin attempts to fuse together the results from different servers. The merging or fusing together of clusters would require the origin to record where the constituent records within the clusters originally came from. A similar problem arises with the Scan and Sort facilities which do not map well onto a distributed search environment. Europagate does not allow scan across multiple targets.

Another problem is one of performance. The origin must maintain all the connections with the targets whilst the user interacts with them. Across the Internet the reliability of the network may not allow such connections to be maintained for long periods of time. Even if the connections can be maintained their throughput is likely to vary unpredictably from time to time. So a target which responds quickly during initialisation may suddenly slow down or time out for no apparent reason, a phenomenon which has been observed many times during the course of the Project while experimenting with remote Z39.50 targets. It could be argued that BOPAC2's current technique of downloading the result set onto the client workstation, may deliver more consistent performance than leaving it on the target.

The simplest arrangement for distributed search is an origin which initiates multiple z-associations to different targets. But this is only one possible arrangement. The MODELS project [46] suggests using a "search broker" which mediates between the origin and the remote targets. With this architecture there would be (at least) two ways to cluster the retrievals. The search broker could fuse together the remote target's clusters to create "virtual" clusters, which is what the origin would retrieve. Alternatively it could add an extra layer of clusters over the top, creating clusters of clusters. The same approach can be applied to WWW-to-Z39.50 gateways. Because such a mediator system would participate in both search and retrieval, the term "search broker" does not accurately describe it and "meta-target" is suggested as a better term.

There are other interesting possibilities here. Explain databases for stable targets could be cached at the meta-target, reducing the overhead at the start each z-association. It might be possible to take several origins which are simultaneously searching the same target and multiplex them (using concurrency and the Reference Id) so that they share the same outgoing z-association. Some databases might even be located on the meta-target leading to the possibility of a mixed architecture: a mixture of union and distributed catalogues.

5. Using BOPAC2

This section describes how BOPAC2 looks to the end user, along with the main issues that were addressed when designing the interface. After reading this the reader is urged to try the system out for themselves [1] if possible. The various components of the interface will be described roughly in the sequence that they appear to the user. Once within the BOPAC2 Java applet, the sequence of events is under the user's control to some extent, and the options available to the user at each stage are outlined here. The interface divides into two parts. In the first part the user activates the search and retrieval under the control of the modified Europagate software. In the second part the Java applet is used to examine the retrieval.

5.1. Search and Retrieval (Europagate software)

5.1.1. Opening page

There are several routes into BOPAC2, each customised for different user communities. Each route has its own opening web page which describes the system as it relates to that community.
  1. The evaluators' page is intended for external evaluators
  2. The Bradford Library user's page is for students and staff in Bradford University library
  3. The Leeds Library user's page is for students and staff in Leeds University library

5.1.2. Select Targets page

Depending on which route the user takes into BOPAC2 he/she will see a list of targets that may be searched. External evaluators see the full list of targets. Bradford University users see Bradford University Library (the default choice), Leeds University library, the British Library Document Supply Centre, and the Library of Congress. Leeds University users see a similar list. The full list of available targets has been described earlier. It includes several UK and Irish universities and some sites in the USA. The user can select one or more of these targets.

5.1.3. Search page

The user then specifies the access point and the query term. Two criteria may be combined via AND or OR. Most Z39.50 origins allow many more criteria, often 5 or 6, to facilitate precision searches. However, since most on-line OPAC's combine two search criteria at the most (typically author and title), it was felt that users familiar with conventional OPAC's would be unlikely to be able to combine more than two criteria together to initiate a precise search. In any case BOPAC2 is designed to work with imprecise retrievals. It was decided therefore that two search criteria would be sufficient. The feedback received so far from end users and librarians supports this view. The search criteria available to the user depend on the capabilities of the target(s) that were chosen previously.

The user also specifies the maximum number of records to retrieve from each target. Clearly the more records are requested, the longer the retrieval will take. A default value of 80 was chosen which works well for UK targets though this is often too high for US sites. It is possible to retrieve 1 record from each target which is useful to see whether or not an item is held there.

As described earlier Z39.50 targets often respond to terms in unexpected ways and it was clear from the evaluators' feedback that this was causing problems. In an effort to help users to specify their query correctly, some examples and hints were added to this search page.

5.1.4. Retrieval Summary page

The next page summarises the results of the retrieval operations on each target. Targets may report errors if they cannot complete the operation, they may time out, or they may return some records. In each case the number of hits (if any) are reported along with the number or retrievals. If there are any retrievals from any of the targets, the user can click a button to launch the Java applet and examine the retrievals.

5.1.5. On-line Questionnaire

There is also an option to fill in the on-line questionnaire in which users answer questions about their impressions of the interface. The design of this questionnaire and analysis of results are described later. In Bradford University library some copies of the questionnaire were supplied on paper.

5.2. Viewing Retrievals with the Java applet

The remaining sections describe the Java applet through which users examine the retrievals. The applet is capable of displaying a range of different views of different types. Each view completely covers the previous one and the user navigates through them using "back" and "forward" buttons. The overall effect could be described as a "mini" browser.

The interface is based on the idea that the user sees more and more about less and less. The user traces a path through the interface, successively narrowing down the subset of records whilst looking at them in more detail. Ultimately, all paths through the interface end up at the Full View. Although there are a number of different views, most parts of the interface remain fixed from one view to another. Although there are many different views, most of the interface components remain in the same position throughout. Each view is essentially a list of choices, and at each stage the user can highlight the item(s) they want and press the [Look At These] button to see them in more detail. The interface is regular, orthogonal, and consequently easy to learn.

5.2.1. Top toolbar

Along the top of the applet is a toolbar with buttons:
  1. [Back] move back one view (similar to web browser back button)
  2. [Forward] move forward one view
  3. [Restart] move back to start view
  4. [Help] context-sensitive help
  5. [Exit] stops the applet and moves the user to the on-line questionnaire.
  6. [Bigger Font] for partially-sighted users

5.2.2. Work Cluster View

If the size of the retrieval is below the FullRecordThreshold the records are shown in Full View. Normally, however, they are shown in Work Cluster View. In this view the retrievals are clustered into works using BOPAC2's clustering algorithm (see earlier) and shown with title, author, plus whichever part of the records match the search terms. Each work cluster may contain one or more records. The user can highlight one or more work headings and press the [Look at These] button to examine the records clustered within those works in detail. This will lead either to Partial View or to Full View depending on how many records there are in the cluster. Note that because the user can select more than one work heading, he/she can correct "broken" clusters. In cases where items should have been brought together automatically, but were not, the user can bring them together manually.

The Work Cluster View toolbar has the following functions:

5.2.3. Partial View

The partial view shows the selected records arranged by work headings. Within each work heading items are sorted into date order and shown with the publication date, additional author or title information (beyond what is shown in the work heading), and any notes. This provides a compact summary of the work and its manifestations. In the case of multi-volume items such as encyclopaedias which have been catalogued using tag 248, these are listed by volume. Partial View is intended mainly for examining a single work cluster in detail, but it may be useful for several work clusters, particularly if they are related (e.g. two works by the same author).

The toolbar for Partial View has the following functions:

5.2.4. Full View

Full view is similar to conventional OPAC's full record view. BOPAC2's full record format is modelled on the format used by the Bradford University Library OPAC (URICA). Full view does not display holdings or circulation information because this is not retrieved from the targets by the Europagate software. Shelfmarks are displayed if they have been catalogued.

Unlike more line-based OPAC's Full View can display more than one full record at a time. Full records are automatically displayed whenever the total number of records selected by the user is below a configurable threshold (the FullRecordThreshold), currently set to 4. If at any stage the user selects a cluster with less than 4 records in it, these will be displayed as full records.

The toolbar in Full View has only one option, [Find], which is a free-text search the same as in Partial View.

5.2.5. MARC View

A special keystroke brings up a view of the actual MARC records. This facility was hidden from the main toolbars at the request of Bradford University Library staff. The MARC View includes the [Find] free text search which works across the whole MARC record including the tag numbers. Both UKMARC and USMARC are supported.

6. Evaluation and User Feedback

The Project intended to evaluate the system as widely as possible, although from the start it was clear that a full scale , exhaustive and comprehensive field test under controlled conditions would not be possible within the project's time scale. The testing and evaluation of the system was to have three threads: As soon as the BOPAC system was working the project team tested it with various queries known to give interesting results. Some examples of these formed the basis for the database used for the BOPAC1 project. At this stage the system was not made public but its URL was given to a number of interested parties in the library community, in the UK and North America who tried the system out and provided much useful feedback which was fed back into the refinement of the system before it was made publicly available.

The second thread was the attempt to compare BOPAC against other OPACs by some objective criteria as detailed below and the final thread was the user testing done in late 1997 and January 1998 which is analysed below. As noted before, because of problems with the operation of Z39.50 software on the Bradford Library system, this testing was not as extensive as hoped nonetheless useful results were obtained. Although rigorously controlled testing could not be undertaken, the system was maintained in a stable state to allow comparison of responses throughout this period. Minimal logging information was also kept which allowed us to investigate some of the comments and problems, raised in users' responses to us.

6.1. Comparison with other OPAC's

In addition to sampling user's responses to the system the success of BOPAC2 in meeting the second objective of the catalogue in bringing together related items was examined. Allyson Carlyle has developed some useful parameters to measure the effectiveness of collocation in on-line catalogues [14]. Various catalogues were compared by searching for her "worst case scenario" retrievals. In each case the display from the standard on-line catalogue was compared with the output from BOPAC2 to see how effectively items were collocated. This meant choosing databases where the Z39.50 target and the conventional OPAC were related, so that it was possible to create on each of them identical retrieval sets (identical sets of records) which could be displayed so that related items were collocated. This proved surprisingly difficult in practice. In most cases, searching the catalogue via Z39.50 produced a different retrieval from the conventional OPAC. Eventually two catalogues were identified: University College Galway and Trinity College Dublin.

The worst case queries included the following works:

Carlyle uses four parameters to measure the effectiveness of the catalogue:
Interruption number
The number of times the display of relevant records was interrupted by irrelevant records.
Intervening records
The number of irrelevant records that are interspersed among the relevant record set.
Average Interruption Size
Average size of the interruptions
Precision
Ratio of relevant records to irrelevant records
These worst case queries were run on the UC Galway and TCD databases; first via the conventional line-based (Telnet) OPAC and then via BOPAC2. The displays were compared according to the above parameters and the results are show in Appendix 4 . In most cases the two forms of display in fact produced the same result with almost perfect collocation. In some cases BOPAC2 was able to collocate better than the conventional OPAC (especially the Dynix system at University College, Galway). In these cases BOPAC2 was taking the title query terms into account and was able to combine the catalogued uniform title from one record with the main entry title on another.

This very brief test on two targets indicates that BOPAC2's clustering algorithm works as well or better than conventional collocation. A more extensive study conducted over a wider range of catalogues would be able to measure the effectiveness of these algorithms more comprehensively and might also be a useful tool for refining them.

6.2. Usage

By the end of the Project, recorded accesses to BOPAC2 numbered some 1500. Before this there were around 100 so BOPAC has been accessed about 1600 times in total. Some people used the system many times, others only once. If the average user is assumed to have tried BOPAC2 four times, then this figure represents 400 different users. It is suspected that many users were put off by the requirement for Java. Many (perhaps the majority) of the libraries in the UK are still running Windows 3.1 PC's which cannot run Java very well if at all. Several respondents to the evaluation questionnaire said they had been stopped at the first hurdle because of this.

The chart below summarises the distribution of different sizes of these 1500 retrievals. Many retrievals are very small but there are a significant proportion of larger retrievals over 65 records.

Retrieval size chart.

6.3. Evaluation with Testers and Feedback Questionnaire

Initially, to test BOPAC2 with users, a request was made on LIS-LINK, the Library and Information Science e-mail list, for people to test the system and to submit comments by e-mail. Requests were also made in person to members of the advisory committee and to several people who had shown interest in our work. The comments were used to refine the system.

Then a feedback questionnaire was designed (based on the British Library OPAC 97 feedback questionnaire) and added as a web form. The questionnaire is shown in Appendix 2. Users were asked to fill the questionnaire in at the end of the session. Library users in local academic libraries in Leeds and Bradford Universities were encouraged to try the system. To this end, each university was given its own special entry point with targets tailored specifically to those university libraries. In Bradford University library a dedicated PC was set up for access to BOPAC2. The URL for BOPAC2 was announced on LIS-LINK, AUTOCAT and the Z39.50 Implementers Workshop e-mail lists and a link was created from the BOPAC2 Project home page and from the NISS list of OPAC's.

6.3.1. Questionnaire results

By the end of the test period 80 questionnaire responses had been filed. Of these 11 were left blank and there were no double respondents. The precise number of users overall is unknown as is the response rate. The lower bound on the response rate (assuming each person accessed the system only once) is 5%. In fact the true figure is probably between 10% and 30%. So the responses