I’m flying out to the west coast of Sweden today to take part in the annual conference of the Nordic Network for Edition Philologists.
The title of my talk translates as “Text mining and digital authorship: curated archives and semantic markup in the time of large raw-text archives.”
I’m going to be talking about the relationship between the carefully-prepared “national editions” of authors such as Ibsen and Strindberg and the large amount of uncurated texts in Google Books and similar archives. The former collections, though small, are often exquisitely developed in detailed TEI/XML, whereas the latter are often raw OCR text dumps from millions of digitized books. Nevertheless, there are some interesting intersections between these two very different kinds of digital text archives.
My talk will be in Swedish, but I hope to develop it further and perhaps present it in English some time in the next year.