"An Easy Method for Beginners in Latin" and macron-insensitive search for Tiddlywiki

Posted: - Modified: | geek

As previously mentioned, W- and I are re-typing parts of Albert Harkness’ 1822 textbook "An Easy Method for Beginners in Latin", which was digitized and uploaded to Google Books as a PDF of images. The non-searchable book was driving W- mad, so we’re re-typing up lessons. It’s a decent way to review, and I’m sure it will be a great resource for other people too.

Here’s what we have so far: An Easy Method for Beginners in Latin, Lessons 1-9

We’re starting off using Tiddlywiki because it’s a wiki system that W-‘s been using a lot for his personal notes. He’s familiar with the markup. It’s not ideal because Google doesn’t index it, the file size is bigger than it needs to be (0.5MB!), and it’s Javascript-based. It’s a good start, though, and I should be able to convert the file to another format with a little scripting. My first instinct would be to start with Org Mode for Emacs, of course, but we already know what W- thinks of Emacs. ;)

Most of the text was easy to enter. Harkness is quite fond of footnotes, numbered sections, and lots of bold and italic formatting. We’re going to skip the illustrations for now.

Typing all of this in and using it as our own reference, though, we quickly ran into a limitation of the standard TiddlyWiki engine (and really, probably all wiki engines): you had to search for the exact word to find something. In order to find poēta, you had to type poēta, not poeta. That’s because ē and e are two different characters.

We wanted to keep the macrons as pronunciation and grammar guides. We didn’t want to require people to know or type letters with macrons. Hmm. Time to hack Tiddlywiki.

TiddlyWiki plugins use Javascript. I found a sample search plugin that showed me the basics of what I needed.

I considered two approaches:

  1. Changing the search text to a regular expression that included macron versions of each vowel
  2. Replacing all vowels in the Tiddler texts with non-macron vowels when searching

The first approach was cleaner and looked much more efficient, so I chose that route. If the search text contained a macron, I assumed the searcher knew what he or she was doing, so I left the text alone. If the text did not contain a macron, I replaced every vowel with a regular expression matching the macron equivalents. Here’s what that part of the code looked like:

s = s.replace(/(.)/g, "['/]*$1");
if (!s.match(macronPattern)) {
  // Replace the vowels with the corresponding macron matchers
  s = s.replace(/a/, "[aāĀA]");
  s = s.replace(/e/, "[eēĒE]");
  s = s.replace(/i/, "[iīĪI]");
  s = s.replace(/o/, "[oōŌO]");
  s = s.replace(/u/, "[uūŪU]");
}

That got me almost all the way there. I could search for most of the words using plain text (so poeta would find poēta and regina would find rēgīnae), but some words still couldn’t be found.

A further quirk of the textbook is that the characters in a word might be interrupted by formatting. For example, poēt<strong>am</strong> is written as =poēt”am”= in Tiddlywiki markup. So I also inserted a regular expression matching any number of ‘ or / (bold or italic markers when doubled) between each letter:

s = s.replace(/(.)/g, "['/]*$1");

It’s important to do this before the macron substitution, or you’ll have regexp classes inside other classes.

That’s the core of the macron search. Here’s what it looks like. I was so thrilled when I got all of this lined up! =)

image

And the source code:

// Macron Search Plugin
// (c) 2011 Sacha Chua - Creative Commons Attribution ShareAlike 3.0 License
// Based on http://devpad.tiddlyspot.com/#SimpleSearchPlugin by FND

if(!version.extensions.MacronSearchPlugin) { //# ensure that the plugin is only installed once
version.extensions.MacronSearchPlugin = { installed: true };

if(!config.extensions) { config.extensions = {}; }

config.extensions.MacronSearchPlugin = {
  heading: "Search Results",
  containerId: "searchResults",
  btnCloseLabel: "Close search",
  btnCloseTooltip: "dismiss search results",
  btnCloseId: "search_close",
  btnOpenLabel: "Open all search results",
  btnOpenTooltip: "Open all search results",
  btnOpenId: "search_open",

  displayResults: function(matches, query) {
    story.refreshAllTiddlers(true); // update highlighting within story tiddlers
    var el = document.getElementById(this.containerId);
    query = '"""' + query + '"""'; // prevent WikiLinks
    if(el) {
      removeChildren(el);
    } else { //# fallback: use displayArea as parent
      var container = document.getElementById("displayArea");
      el = document.createElement("div");
      el.id = this.containerId;
      el = container.insertBefore(el, container.firstChild);
    }
    var msg = "!" + this.heading + "\n";
    if(matches.length > 0) {
        msg += "''" + config.macros.search.successMsg.format([matches.length.toString(), query]) + ":''\n";
      this.results = [];
      for(var i = 0 ; i < matches.length; i++) {
        this.results.push(matches[i].title);
        msg += "* [[" + matches[i].title + "]]\n";
      }
    } else {
      msg += "''" + config.macros.search.failureMsg.format([query]) + "''\n"; // XXX: do not use bold here!?
    }
    wikify(msg, el);
    createTiddlyButton(el, "[" + this.btnCloseLabel + "]", this.btnCloseTooltip, config.extensions.MacronSearchPlugin.closeResults, "button", this.btnCloseId);
    if(matches.length > 0) { // XXX: redundant!?
      createTiddlyButton(el, "[" + this.btnOpenLabel + "]", this.btnOpenTooltip, config.extensions.MacronSearchPlugin.openAll, "button", this.btnOpenId);
    }
  },

  closeResults: function() {
    var el = document.getElementById(config.extensions.MacronSearchPlugin.containerId);
    removeNode(el);
    config.extensions.MacronSearchPlugin.results = null;
    highlightHack = null;
  },

  openAll: function(ev) {
    story.displayTiddlers(null, config.extensions.MacronSearchPlugin.results);
    return false;
  }
};

// override Story.search()
Story.prototype.search = function(text, useCaseSensitive, useRegExp) {
  var macronPattern = /[āĀēĒīĪōŌūŪ]/;
  var s = text;
  // Deal with bold and italics in the middle of words
  s = s.replace(/(.)/g, "['/]*$1");
  if (!s.match(macronPattern)) {
    // Replace the vowels with the corresponding macron matchers
    s = s.replace(/a/, "[aāĀA]");
    s = s.replace(/e/, "[eēĒE]");
    s = s.replace(/i/, "[iīĪI]");
    s = s.replace(/o/, "[oōŌO]");
    s = s.replace(/u/, "[uūŪU]");
  }
  var searchRegexp = new RegExp(s, "img");
  highlightHack = searchRegexp;
  var matches = store.search(searchRegexp, null, "excludeSearch");
  config.extensions.MacronSearchPlugin.displayResults(matches, text);
};

// override TiddlyWiki.search() to ignore macrons when searching
TiddlyWiki.prototype.search = function(s, sortField, excludeTag, match) {
    // Find out if the search string s has a macron
    var candidates = this.reverseLookup("tags", excludeTag, !!match);
    var matches = [];
    for(var t = 0; t < candidates.length; t++) {
        if (candidates[t].title.search(s) != -1 ||
            candidates[t].text.search(s) != -1) {
            matches.push(candidates[t]);
        }
    }
    return matches;
};

} //# end of "install only once"

To add this to your Tiddlywiki, create a new tiddler. Paste in the source code. Give it the systemConfig tag (the case is important). Save and reload your Tiddlywiki file, and it should be available.

It took me maybe 1.5 hours to research possible ways to do it and hack the search plugin together for Tiddlywiki. I’d never written a plugin for Tiddlywiki before, but I’ve worked with Javascript, and it was easy to pick up. I had a lot of fun coding it with W-, who supplied plenty of ideas and motivation. =) It’s fun geeking out!

You can comment with Disqus or you can e-mail me at sacha@sachachua.com.