Day One

Tuesday May 30, 2017 

Breakfast (8:45 – 9:30 AM) Jones Room

Session One (9:30 – 11:00 AM) Woodruff Library 312

Self-introductions: Participants and instructors introduce themselves and their research. Instructors explain course structure

Session Two (11:15 AM-12:30 PM) Woodruff Library 312

Web-based tools: Philologic and the Aozora Bunko (Long)

This session will introduce the University of Chicago’s project linking the Aozora Bunko and PhiloLogic, pioneered by Hoyt Long. The Aozora Bunko is an on-line collection of over 13,000 digitized Japanese texts, including fiction, non-fiction, and poetry. Because of copyright law, all texts are from before 1966, but the corpus includes a wide range of texts, such as Shimazaki Toson’s Yo’ake mae, the poetry of Hakushū Kitahara, letters by Sakamoto Ryōma, and children’s fiction by Muruyama Kazuko.

PhiloLogic is a suite of software developed by the ARTFL Project at the University of Chicago. It is an easy to use, yet powerful, full-text search, retrieval, and reporting system, allowing searches based on multiple criteria.  One can, for example, either retrieve all text in the Aozora Bunko written by select authors, from 1935 through 1945, which also contain the phrases 平和 and 堕落. The system will report results for word frequency, word context (KWIC – key words in context), and word collocation (which words occur together).

Key topics will include:

  • Word frequencies
  • Collocation
  • KWIC
  • Word occurrence in time series

Lunch Break (12:30 – 1:30 PM) Jones Room

Session Three (1:30 PM – 3:00 PM) Woodruff Library 312

Web-based tools for user-selected texts (Des Jardin and Goss)

Building on the introduction of PhiloLogic, this session will demonstrate how researchers can move the Aozora Bunko to analyze other digitized corpora, such as the etexts at UVA’s Japanese Text Initiative. Coverage will include:

    • Voyant Tools, for general analysis with an English-language interface
    • NINJAL’s 茶まめ series, which includes specialized tokenizers for modern 現代語, early modern 近代文語, and classical 中古和文

Coffee Break: Woodruff Library 303, Emory Center for Digital Scholarship

Session Four (3:15 PM – 4:00 PM) Woodruff Library 312

Small groups with hands-on support for basic text mining with web interfaces

    • PhiloLogic and the Aozora Bunko
    • Voyant Tools
    • NINJAL’s 茶まめ series