Historiographical Research on Natsume Soseki and Dazai Osamu

Plan for Final Research Project

For the final research, I am going to analyze the works of the early 20th century Japanese writers. I want to choose two to three authors from Natsume Soseki, Dazai Osamu, Tanizaki Junichiro and Akutagawa Ryunosuke, so the project will not be too overwhelming. Many of their work are available on Aozora Bunko, and I have read at least one of their work.

Historiographical Research

The Data for Research of JSTOR is not a perfect tool for historiographical research on Japanese literature. My search for the keyword “Nastume Soseki” gives back a two documents on Shakespeare in 2000 and 2001.

shakespeareThese outliers, however, are not going to seriously affect the study, since I am only interested in counting word frequency and I have a large data set. The code I used is from class. I wrote some new code for graphing and picking frequent word from the document.

The key term “Nastume Soseki” yields a result of 697 documents. For the vertical axis in all the following graphs, I used rolling mean of percentage of the word over five years, since it gives the most smooth graph.

Group 1: Translation

Since the majority of documents from JSTOR are in English, I expected many documents to discuss translation. The first group of keywords that I looked for is “translation” and names of three famous translators.natsume-soseki-translationThe graph shows that the study of Nastume Soseki’s work in translation rise around 1950. It makes sense, since most of his work was translated after World War II. There are only a few document before 1950 in the result. The key term “translation”, although with some fluctuation, is always important after 1950. Three other keywords “McClellan”, “Keene” and “Seidensticker”, translators’ names, appeared more from 1950 to 2000. The three authors were all born in 1920s, so their works concentrated in the late 20th century.

Group 2: Language

natsume-soseki-languageThe keyword “Japanese” appeared dominant as expected. Because most of the documents are from Asian studies journals, “Chinese” and “Korean” appears frequently. The line of “English” is close to the line of “Chinese”. If most of works are about translation, the word “English” would appear more. Therefore, there maybe a large portion of the work that does not directly discuss translation; these documents are probably about general literature or cultural study.

GROUP 3: Theme

The four key words in these graph are “death”, “love”, “moral” and “war”. “Love” and “war” are more prevalent. “War” also appears in works published during WWII, and have several peaks. I do not remember reading a lot about war in Natsume Soseki’s works, but scholars might what to find what is the connection between pre-war literature and WWII. I find “war” is similarly a dominant key term in the search for “Dazai Osamu” in DfR in JSTOR, although most of his work are not related to war.


natsume-soseki-disciplineHere, I am interested in how scholars interpreted Natsume’s works and their political, social, economic and historical connections. “Political” and “social” are closely related, since they move together. “Economic” has a falling importance, while “historical” appeared to be more import along the timeline.


natsume-soseki-authorsThe search for “Natsume Soseki” in JSTOR does not return documents exclusively about Natsume Soseki. Some documents about other Japanese authors may also appear. The graph above shows that “natsume” was constantly above other authors, except years aroud 1965 and 2000, when “akutagawa” has two peaks. The truth is that “akutagawa” appears in total 109 times from 1968 to 1972 and 153 times in 2004. The data set is not perfect, but it will not cause serious bias.

The graph also shows the correlation between authors. Three of the authors, “murasaki”, “chikamatsu” and “matsuo”, are not from the 20th century. Their lines are in green and blue and do not raise much from 0. Modern authors, whose lines are in orange and red. The line for “tanizaki”, “dazai”and “kawabata” are close together.

Similar Graph for the Search of Dazai Osamu


“Keene” is more important among the three translator. He translated Dazai’s “No Longer Human”.


“War” is also a dominant theme, but the peaks around 1960 and 2000 are somewhat different in time from the previous graph from Nastume Soseki.


This graph looks better, since “dazai” is more dominant.

Google Ngram

Google Ngram is easy to use and its results are interesting.


All five are the 20th century Japanese writers. From this graph, we can see the frequency increased from 1950, and have two peaks at 1970s and 1990s. This partially match the graph of Dazai, but is different from the graph of Natsume from JSTOR.


Three ancient writers, Murasaki (11th c.), Matsuo(17th c.) and Chikamatsu(17th c.) do not follow the pattern of 20th century writers.


Contemporary writers, Murakami (1949 – ) do not follow the pattern as well.

Part of the code for PLOtting

I have difficulty changing the order of the legends, but everthing else works fine.

keepers <- c("japanese","english","chinese","korean")
Tokugawa.full.smaller <- Tokugawa.full.perc.df[,keepers]
Tokugawa.full.smaller[is.na(Tokugawa.full.smaller)] <- 0
Tokugawa.smaller.roll.5 <- data.frame(rollmean(Tokugawa.full.smaller, k=5, fill = list(NA, NULL, NA)))
Tokugawa.smaller.roll.5$pubyear <- Tokugawa.full.perc.df$pubyear
mathching <- c("japanese" = "black","english" = "blue","chinese" = "red","korean" = "green")
ggplot(Tokugawa.smaller.roll.5, aes(x=pubyear)) + 
 geom_line(aes(y = japanese, color = "japanese")) +
 geom_line(aes(y = english, color = "english"))+
 geom_line(aes(y = chinese, color = "chinese")) +
 geom_line(aes(y = korean, color = "korean")) +
 scale_colour_manual(name="Keywords",values = mathching)+
 xlab("Year") + ylab("Rolling Mean of Percentage over Five Years") 


Leave a Reply

Your email address will not be published. Required fields are marked *