Japanese Language Text Mining: Digital Humanities Methods for Japanese Studies
The workshop brings together researchers working across the fields of computational text analysis and Japanese Studies. The workshop sessions will focus on the unique challenges of digital analyses of Japanese texts. Topics will include:
- Finding and using web-based corpora, e.g., the Aozora Bunko
- Using web-based analytical tools, e.g., PhiloLogic
- Creating digital collections (corpora), including challenges of OCR (Optical Character Recognition) for Japanese texts
- Specialized tools for classical, early modern, and modern Japanese grammar
- Methodological principles that underlie standard text mining techniques (e.g., word frequencies, collocation, KWIC, document term matrices, metrics of text similarity)
The workshop will include two half-day specialized sessions: a tutorial on Japanese language and orthography for digital humanities specialists and a session on basic computational concepts and methods for Japan specialists.
Through the generous support of the Japan Foundation, all participants will be provided with accommodations, travel support (up to $400), as well as breakfast and lunch during the workshop. We strongly encourage candidates to seek supplemental funding from their home institutions.
Workshop leaders:
- Mark Ravina (Emory University) histmr [at] emory [dot] edu
- Hoyt Long (University of Chicago) hoytlong [at] uchicago [dot] edu
- Molly Des Jardin (University of Pennsylvania) mollydes [at] upenn [dot] edu
Sponsorship and support: