Technical Details
Technical Approaches and Contributions
Technical support for the Materia Medica in Transit digital publication consists of information structuring and website construction.
Digitization and Information Design of the Michiel Herbal Elli Mylonas and Cody Carvel of the Center for Digital Scholarship at Brown University generated good quality OCR from scans of the De Toni edition (1940) of the Herbal by Pietro Antonio Michiel. For this phase, work focussed on the first section (Libro Azzurro). The existing pdf of the De Toni edition was not very accurate, so the CDS team used Microsoft Azure to generate new OCR text. Together with Sabrina Minuzzi, the group identified important features of the text and developed a TEI customization for marking it up. It was then possible to apply basic structural markup to the resulting text. Further details were added by hand by Dr. Minuzzi, such as the identification of people and places, as well as links to bibliographical references and plant identifications.
Files relevant to the encoding process
- Encoding Documentation (Draft Version): https://github.com/emylonas/mat-med/blob/main/assets/xml/DRAFTEncodingDoc.pdf
- Libro Azzurro encoded in XML: https://github.com/emylonas/mat-med/blob/main/assets/xml/michiele-azzurro.xml
- Mat Med TEI customization: https://github.com/emylonas/mat-med/blob/main/assets/xml/tei_MatMed.rng
- ODD file for generating the TEI customization: https://github.com/emylonas/mat-med/blob/main/assets/xml/tei_simplePrint-matMed.odd
Currently the Libro Azzurro has been encoded, and it is possible to automatically generate indices of people and places that are cross-linked with the text. Work continues on identifying and disambiguating people and places. The work that has been done on the Libro Azzurro provides a model for further work on the rest of the Michiel Herbal.
Website Development In order to publish the project on the web it was necessary to find a framework that could handle XML and lend itself to automated regeneration of the site, so that Dr. Minuzzi could easily edit and generate new versions of the site. It also had to be easily portable, couldn’t rely on significant infrastructure, and had to be as sustainable as possible with little support. TEI Publisher (https://teipublisher.com/index.html) initially seemed to be a good choice, as it is a native TEI online publishing system. However, at the time when we were working with it, it had substantial server requirements that would have been cost-prohibitive, and did not fit the constraints of most institutional hosting. TEI Publisher is now able to generate a static site, but this wasn’t the case 2 years ago.
The team decided to use a static site generator in order to have a site that would require no back end support, and so would be easy to host. We chose Jekyll and the Minimal Mistakes theme. Jekyll has the advantage that it is also native to GitHub; it can be set up to generate a new site whenever changes are committed and pushed to GitHub. This enambles scholars or content specialists to edit the site or add content, and make it live without intervention from technical contributors. The project received 20 hours of design help from the Brown University Library, to create a header graphic and provide help with layout and consistency. We added CSS, Javascript and new layouts to the theme, but have not made changes to its basic structure. The website consists of three main components - two expository sections on gardens and apothecary shops in Venice, and the Michiel edition. The first two are edited directly in Jekyll using Markdown or HTML as needed. The source files are available to Dr. Minuzzi, but have mostly been put together by Elli Mylonas. The edition, which is encoded as a single document, is segmented by an XSL script so that each plant can be displayed as a single web page in Jekyll, and the native TEI XML is formatted on the client side using the CETEIcean javascript library.
The website is currently running on GitHub
(https://emylonas.github.io/mat-med/)
as well as on a Reclaim Hosting test site:
https://emylonas.reclaim.hosting/mat-med/.
The project is in the process of acquiring the domain name
materiamedicaintransit.eu
.
Documentation The project plans to add to the description of the process and work flow, as well as expanding the encoding documentation, on the website.
Future Work As noted above, the work that has been done to add XML markup to the OCRed De Toni edition serves as a model for the encoding of the other 4 books. The website generation is also in place so that more books can be added with very little customization.