Beowulf : a digital edition

During Michaelmas term, I attended a weekly lunchtime training session run by Emma Huber (Subject Librarian for German) at the Taylor Institution. The ‘Digital Editions Course’ saw a mixture of undergraduates, postgraduates and library staff each work to transcribe, edit and encode an out-of-copyright text to be hosted on a dedicated web portal ( Attendees had the chance to learn principles and practice behind digitisation and encoding, and then spent time by themselves during the week putting these into practice. The course would allow us to develop new skills while contributing to the corpus of digitally available texts.

Choosing a text:

Texts would need to be out of copyright. The easiest way to pick something safe is a publication by a single author who has been dead for at least 70 years. The Taylor Institution Library is billed as the library for Medieval and Modern Languages and so the chosen text would have to be from this collection or fit within its themes. Naturally, we wanted to be able to digitise something from the collection at St Anne’s so after some deliberation, I landed on an 1895 edition of the Old English epic Beowulf which is held by both St Anne’s and the Taylor.


This 1895 edition is interesting for having been produced by William Morris and his Kelmscott Press. Lavishly put together but with a simpler level of ornamentation than some of the Press’s later productions, it is nevertheless a beautiful object.

The title page of the edition would make for some satisfying colouring in.

Not a scholar in Old English himself, Morris collaborated with a young Cambridge academic, A. J. Wyatt, who was producing a scholarly edition of the manuscript.  Morris used a prose translation by Wyatt as a crib for his own translation, while Wyatt’s edition of the Old English text was published in 1894 by Cambridge University Press as Beowulf : edited with textual foot-notes, index of proper names, and alphabetical glossary


The first step towards creating the edition would be the photography. Lacking tripods and DSLRs, we were given advice about how to take the best pictures with a simple digital camera or mobile phone.  Setting the resolution to the highest possible would ensure a more zoom-able photo and creating images with digital preservation in mind would aid their longevity.  JPGs would risk losing quality each time they were saved, while a .TIF or a .PNG file would be preservable without degrading.

Our simple photography set up (shown here) allowed photos to be taken from above the level of the desk lamps, thereby eliminating any shadows.

Once the photos were saved, they were copied and converted into .TIF format before being cropped, rotated and adjusted to size as necessary.


An easily overlooked element of digital preservation is the importance of good file-naming practice.  In the event of data loss and restoration, a future user might be confronted with a dump of a data without any file structure.  In such a case, all files with names like ‘Image-01’ or ‘P01231’ would give little clue as to their origin and necessitate individual inspection.

The best naming convention for images of library items is to use their shelf-mark (the sticker on their spine) followed by the page number.  So for example page 17 of the Beowulf text would be saved as ‘821.13_11_P17.TIF’.


To make the images a useful part of the edition, they needed to be accompanied by a transcription.  Optical Character Recognition (OCR) works very well for converting images with standard typefaces but can struggle with anything non-standard. Morris’s artful typeface proved resistant to OCR mapping and so the entire section needed to be manually transcribed.

When producing an edition of a text, one also needs to decide whether the edition will be diplomatic or critical.  A diplomatic transcription would record every detail as closely as possible to the original, including inconsistencies of spelling, errors and omissions.  A critical edition might instead standardise and/or modernise spelling and punctuation to make a text easier for a modern reader.  The Beowulf didn’t require many major policy decisions but even something as simple as removing a hyphen between a word that runs across two lines is an editorial decision and must be recorded.


Once the transcription had been produced and double checked against the original, we were introduced to XML, TEI and the concept of encoding.  Encoding uses a programming language (XML — Extensible Markup Language) with a recognised framework (TEI — Text Encoding Initiative) of rules to turn a block of text into something that can be understood and manipulated by computer software.  In order to make this part of the process easier, we used some XML editing software called oxYgen.  In theory though, XML tags could be added in a simple text editor like Notepad.

To illustrate how XML/TEI works: adding <l></l> tags on either side of a line records the text in between as a ‘line’ unit.  Putting <lg></lg> around a group of lines forms a unit called a ‘line group’ or what we would think of as a stanza or verse.

<head></head> denotes a header and </pb> is a page-break.  These tags are all quite straightforward but some of the more advanced applications of XML/TEI include encoding the rhythm and rhyme scheme of a poem for statistical analysis, or displaying both an expanded and abbreviated form of a word.  The encoded language can be used when the document is later fed into front end display software and at this point decisions can be made about how to treat different tags.

A view from the oxYgen XML editor.

The Digital Edition:

After we had carried out our encoding, Emma Huber and her team helped us with any XML corrections and then hosted the editions on the project website.

This edition can be found here: and is designed to be read in collaboration with the digitised edition of A. J. Wyatt’s 1894 Beowulf manuscript edition, available here:

In the final edition, words from Morris’s glossary (which was not itself part of the transcribed extract) have been added as notes with an asterisk.  Users can easily navigate between images and the accompanying transcription by clicking on the page numbers.


Morris, W. & Wyatt, A.J., 1895. The tale of Beowulf.

Wyatt, A.J., 1894. Beowulf.

This article was written by Duncan Jones (Reader Services Librarian).