The statistic results on map

The difference of percentage (provinces scale) between Manchu and Chinese version on map combines the line of the Coastal Exclusion Policy.
rplot04

The difference of percentage (cities scale) between Manchu and Chinese version on map combines the line of the Coastal Exclusion Policy.

rplot05

Coding is below.

##————————————————————————————————–##

rm(list=ls())
fileEncoding= “UTF-8”

## read file
setwd(“~/Desktop/003_PhD/016_Coursework/003_2016 Fall/003_HIST582A/003_Text”)
library(stringr)

## scan Chinese and Manchu texts
Chinese.vol.1.txt <- scan(“PDHF_Chinese_1.txt”, what = “chr”)
Chinese.vol.2.txt <- scan(“PDHF_Chinese_2.txt”, what = “chr”)
Chinese.vol.3.txt <- scan(“PDHF_Chinese_3.txt”, what = “chr”)
Manchu.vol.1.txt <- scan(“PDHF_Manchu_1.txt”, what = “chr”)
Manchu.vol.2.txt <- scan(“PDHF_Manchu_2.txt”, what = “chr”)
Manchu.vol.3.txt <- scan(“PDHF_Manchu_3.txt”, what = “chr”)

##————————————————————————————————–##
## [toponym counts] ##
## read table of place names in Chinese, Manchu, and English
Ch.place.names <- read.table(“Chinese_place_names.txt”, stringsAsFactors = FALSE)
Man.place.names <- read.table(“Manchu_place_names.txt”, sep=”\t”, stringsAsFactors = FALSE)
Eng.place.names <- read.table(“English_place_names.txt”, sep=”\t”, stringsAsFactors = FALSE)

## creating a new colname
Man.place.names$places <- tolower(Man.place.names$V1)
Ch.place.names$places <- tolower(Ch.place.names$V1)
Eng.place.names$places <- tolower(Eng.place.names$V1)
Ch.toponym <- unique(Ch.place.names$V1)
Man.toponym <- unique(Man.place.names$places)
Eng.toponym <- unique(Eng.place.names$V1)

## paste the full text
Manchu1 <- tolower(paste(Manchu.vol.1.txt, collapse = ” “))
Manchu2 <- tolower(paste(Manchu.vol.2.txt, collapse = ” “))
Manchu3 <- tolower(paste(Manchu.vol.3.txt, collapse = ” “))
Chinese1 <- paste(Chinese.vol.1.txt, collapse = “”)
Chinese2 <- paste(Chinese.vol.2.txt, collapse = “”)
Chinese3 <- paste(Chinese.vol.3.txt, collapse = “”)

## make the full Chinese text as a dataframe
Ch.Texts.df <- rbind.data.frame(Chinese1, Chinese2, Chinese3, stringsAsFactors = FALSE)
Chinese_all <- paste(Ch.Texts.df, collapse =””)
Ch.Texts.df <- rbind.data.frame(Chinese1, Chinese2, Chinese3, Chinese_all, stringsAsFactors = FALSE)

## rename the colname
colnames(Ch.Texts.df) <- “texts”

Ch.Text.metrics <- data.frame(t(data.frame(lapply(Ch.toponym, FUN=function(x) str_count(Ch.Texts.df$texts, x)))))

## put the place as one colname
Ch.Text.metrics$places <- Ch.toponym

## make three colnames sequently
colnames(Ch.Text.metrics)[c(1:4)] <- c(“Chinese1”, “Chinese2”, “Chinese3”, “Chinese_all”)

## the same process of Chinese in the Manchu version
Man.Texts.df <- rbind.data.frame(Manchu1, Manchu2, Manchu3, stringsAsFactors = FALSE)
Manchu_all <- tolower(paste(Man.Texts.df, collapse = ” “))
Man.Texts.df <- rbind.data.frame(Manchu1, Manchu2, Manchu3, Manchu_all, stringsAsFactors = FALSE)
colnames(Man.Texts.df) <- “texts”

Man.Text.metrics <- data.frame(t(data.frame(lapply(Man.toponym, FUN=function(x) str_count(Man.Texts.df$texts, x)))))

Man.Text.metrics$places <- Man.toponym
colnames(Man.Text.metrics)[c(1:4)] <- c(“Manchu1”, “Manchu2”, “Manchu3”, “Manchu_all”)

## combine Chinese and Manchu dataframe together
Combined.df <- cbind.data.frame(Ch.Text.metrics, Man.Text.metrics)
Combined.df$Chinese1.perc <- Combined.df$Chinese1/sum(Combined.df$Chinese1)*100
Combined.df$Chinese2.perc <- Combined.df$Chinese2/sum(Combined.df$Chinese2)*100
Combined.df$Chinese3.perc <- Combined.df$Chinese3/sum(Combined.df$Chinese3)*100
Combined.df$Chinese_all.perc <- Combined.df$Chinese_all/sum(Combined.df$Chinese_all)*100
Combined.df$Manchu1.perc <- Combined.df$Manchu1/sum(Combined.df$Manchu1)*100
Combined.df$Manchu2.perc <- Combined.df$Manchu2/sum(Combined.df$Manchu2)*100
Combined.df$Manchu3.perc <- Combined.df$Manchu3/sum(Combined.df$Manchu3)*100
Combined.df$Manchu_all.perc<- Combined.df$Manchu_all/sum(Combined.df$Manchu_all)*100

## show the result
Combined.df$toponym <- paste(Combined.df[,10], Combined.df[,5], Eng.place.names$places, sep=” “)
Combined.df$toponym

##————————————————————————————————–##
## [Coastal exclusion policy] ##
library(ggmap)
chinastate.map<-get_map(location=”china”, zoom=10, maptype=”satellite”)

cities.cep<- c(“廣西壯族自治區欽州市”, “廣西壯族自治區北海市合浦縣”, “合浦县石城村”, “廣東省湛江市遂溪县乾留”,
“湛江市雷州市海康港”, “湛江市雷州市扶茂”, “廣東省湛江市徐聞縣”, “廣東省湛江市徐聞縣海安鎮”,
“廣東省湛江市雷州市深田村”, “廣東省湛江市雷州市”, “廣東省湛江市遂溪縣”, “廣東省湛江市遂溪縣長坡墩”,
“廣東省湛江市吴川市博茂”, “廣東省湛江市吳川市”, “廣東省茂名市電白區”,
“廣東省陽江市陽西縣雙魚村”, “廣東省陽江市”, “江門市恩平市”, “廣東省江門市開平市”, “新會區將軍山旅遊區”,
“廣東省江門市新會區崖門鎮”, “廣東省江門市新會區”, “新會區觀音山”, “佛山市順德區”, “中山市三角鎮”,
“中山市馬鞍村”, “南沙区小虎山”, “深圳市寶安區西鄉”, “深圳市大鵬所城”, “海丰县琵琶”, “廣東省汕尾市海豐縣”,
“揭陽市惠來縣”, “揭陽市惠來縣靖海鎮”, “广东省汕头市潮南区古埕”, “廣東省汕頭市潮陽區”,
“揭陽市揭東區鄒堂”, “廣東省揭陽市”, “廣東省潮州市”, “潮州市饒平縣”, “福建省漳州市詔安縣分水關”,
“福建省漳州市詔安縣”, “漳州市云霄县油甘公”, “漳州市漳浦縣”, “漳州市漳浦縣橫口圩”, “漳州市龙海市洪礁寨”,
“漳州市龍海市海澄鎮”, “福建省漳州市龍文區江東橋”, “廈門市同安區蓮花村”, “廈門市同安區”, “廈門市翔安區小盈嶺”,
“福建省泉州市南安市大盈”, “福建省晉江”, “福建省泉州市南安市”, “泉州市洛江区洛陽橋”,
“泉州市惠安县石任”, “泉州市泉港区九峰山”, “莆田市荔城區壺公山”, “莆田市涵江區江口鎮”, “福清市高嶺村”,
“福州市福清市”, “長樂市岐陽村”, “馬尾區閩安村”, “福州市連江縣”, “連江縣浦口鎮”, “蕉城區白鶴嶺”,
“福建省寧德市”, “福安市洋尾”, “福安市小留村”, “寧德市福安市”, “福鼎市沙埕鎮”)
geo.cities.cep <- geocode(cities.cep)
geo.cities.cep.df<- data.frame(geo.cities.cep)

ggmap(chinastate.map) + geom_point(data=geo.cities.cep, aes(x=lon, y=lat))+ xlim(c(108, 122)) +ylim(c(20,28))

##————————————————————————————————–##
## [map] ##
library(ggmap)
china.map<-get_map(location=”China”, zoom=10, maptype=”satellite”)

cities<- c(“福建省泉州南安市安平橋”, “福建省廈門市同安區丙洲”, “湖南省長沙市”, “廣東省潮州市”,
“中國福建省泉州市惠安縣崇武鎮”, “中國福建省三明市永安市大漳山”, “福建福州市連江縣定海古城”,
“印尼雅加達”, “福建省寧德市霞浦縣烽火島”, “中國福建省”, “福建省福州市”, “福建省福州市長樂市新塘”,
“江蘇省揚州市邗江區瓜洲鎮”, “中國廣東省”, “中國貴州省”, “福建省漳州市龍海市海澄鎮”,
“福建省福州市平潭縣海壇島”, “福建省福州市台江區河口新村”, “中國湖北省”, “廣東省惠州市”,
“江蘇省南京市”, “廣東省汕尾市陸豐市碣石鎮”, “中國江蘇省”, “金門縣”, “廣東省汕頭市濠江區馬滘”,
“福建省莆田市秀嶼區湄洲大道湄洲島”, “福建省福州市馬尾區閩安村”, “廣東省汕頭市南澳縣”,
“浙江省寧波市”, “普列莫爾斯基區海參崴”, “廣東省汕頭市龍湖區鷗汀”, “臺灣省澎湖”,
“福建省莆田市秀嶼區平海鎮”, “浙江省溫州市平陽縣”, “福建省泉州市”, “浙江省紹興市”,
“福建福州市平潭縣石牌洋”, “福建省泉州石井鎮”, “福建省泉州市惠安縣獺窟島”, “浙江省台州市”,
“臺灣台南”, “福建省龍岩市長汀縣汀州”, “福建省廈門市同安區”, “福建省漳州市東山縣”,
“福建省泉州市惠安縣”, “福建省泉州市晉江市圍頭村”, “浙江省溫州市”, “福建省漳州市龍海市浯嶼”,
“福建省廈門市翔安區斗門”, “福建省廈門市”, “福建省莆田市”, “福建省廈門市集美區潯尾”,
“福建省泉州市石獅市永寧鎮”, “湖南岳陽縣”, “福建省漳州市雲霄縣”, “福建省漳州市”,
“中國浙江省”, “江蘇省鎮江市”, “浙江省舟山市”, “福建省泉州石獅市”)

geo.cities <- geocode(cities)
geo.cities.df<- data.frame(geo.cities)

map.df <- cbind.data.frame(Combined.df, geo.cities.df)

library(maps)
library(mapdata)
library(ggplot2)
world.map<- borders(database=”world”)
ggplot()+ world.map+ coord_quickmap()

world.map<- borders(database=”world”, colour=”gray20″, fill=”gray60″)
ggplot() + world.map +
coord_map(projection = “gilbert”, xlim =c(100,140), ylim=c(-20,50)) +
xlab(“”) + ylab(“”)+ ggtitle(“Percentage difference map”)
##————————————————————————————————–##
## [The new try] ##
## mixed provinces and cities ##
## volume 1 ##
Combined.df$perc1.diff <- (Combined.df$Chinese1.perc – Combined.df$Manchu1.perc)
Combined.df$perc2.diff <- (Combined.df$Chinese2.perc – Combined.df$Manchu2.perc)
Combined.df$perc3.diff <- (Combined.df$Chinese3.perc – Combined.df$Manchu3.perc)
map.df <- cbind.data.frame(Combined.df, geo.cities.df)
map.df$type1 <- ifelse(map.df$perc1.diff>0, “Chinese”, “Manchu”)
map.df$type1 <- ifelse(map.df$perc1.diff == 0, NA, map.df$type1)
map.df$type2 <- ifelse(map.df$perc2.diff>0, “Chinese”, “Manchu”)
map.df$type2 <- ifelse(map.df$perc2.diff == 0, NA, map.df$type2)
map.df$type3 <- ifelse(map.df$perc3.diff>0, “Chinese”, “Manchu”)
map.df$type3 <- ifelse(map.df$perc3.diff == 0, NA, map.df$type3)
map.df$scale <- c(“city”, “city”, “city”, “city”, “city”, “city”, “city”, “province”, “city”,”province”, “city”, “city”, “city”,
“province”, “province”, “city”, “city”, “city”, “province”, “city”, “city”, “city”, “province”, “city”, “city”,
“city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”,
“city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”, “city”,
“city”, “city”, “city”, “province”, “city”, “city”, “city”)

## only province in Manchu and Chinese##
## volume 1 ##
vol1bp<- ggplot() + world.map + geom_point(data = subset(map.df, scale== “province”), aes(x = lon, y = lat, color=type1, shape = scale, size=abs(perc1.diff))) +
guides(size = FALSE) +
geom_path(data = geo.cities.cep.df, aes(x = lon, y = lat, color = “Coastal Exclusion Policy”))+
coord_map(projection = “stereographic”, xlim = c(112, 123), ylim = c(21,34)) + ylab(“”) + xlab(“”) +
ggtitle(“Provinces in Volume 1”)

## volume 2 ##
vol2bp<- ggplot() + world.map + geom_point(data = subset(map.df, scale == “province”), aes(x = lon, y = lat, color=type2, shape = scale, size=abs(perc2.diff))) +
guides(size = FALSE) +
geom_path(data = geo.cities.cep.df, aes(x = lon, y = lat, color = “Coastal Exclusion Policy”))+
coord_map(projection = “stereographic”, xlim = c(112, 123), ylim = c(21,34)) + ylab(“”) + xlab(“”) +
ggtitle(“Provinces in Volume 2”)

## volume 3 ##
vol3bp<- ggplot() + world.map + geom_point(data = subset(map.df, scale == “province”), aes(x = lon, y = lat, color=type3, shape = scale, size=abs(perc3.diff))) +
guides(size = FALSE) +
geom_path(data = geo.cities.cep.df, aes(x = lon, y = lat, color = “Coastal Exclusion Policy”))+
coord_map(projection = “stereographic”, xlim = c(112, 123), ylim = c(21,34)) + ylab(“”) + xlab(“”) +
ggtitle(“Provinces in Volume 3”)

##only cities in Manchu and Chinese##
## volume 1##
vol1bc<- ggplot() + world.map + geom_point(data = subset(map.df, scale== “city”), aes(x = lon, y = lat, color=type1, shape = scale, size=abs(perc1.diff))) +
guides(size = FALSE) +
geom_path(data = geo.cities.cep.df, aes(x = lon, y = lat, color = “Coastal Exclusion Policy”))+
coord_map(projection = “stereographic”, xlim = c(112, 123), ylim = c(21,34)) + ylab(“”) + xlab(“”) +
ggtitle(“Cities in Volume 1”)

## volume 2 ##
vol2bc<- ggplot() + world.map + geom_point(data = subset(map.df, scale== “city”), aes(x = lon, y = lat, color=type2, shape = scale, size=abs(perc2.diff))) +
guides(size = FALSE) +
geom_path(data = geo.cities.cep.df, aes(x = lon, y = lat, color = “Coastal Exclusion Policy”))+
coord_map(projection = “stereographic”, xlim = c(112, 123), ylim = c(21,34)) + ylab(“”) + xlab(“”) +
ggtitle(“Cities in Volume 2”)

##volume 3 ##
vol3bc<- ggplot() + world.map + geom_point(data = subset(map.df, scale== “city”), aes(x = lon, y = lat, color=type3, size=abs(perc3.diff))) +
guides(size = FALSE) +
geom_path(data = geo.cities.cep.df, aes(x = lon, y = lat, color = “Coastal Exclusion Policy”))+
coord_map(projection = “stereographic”, xlim = c(112, 123), ylim = c(21,34)) + ylab(“”) + xlab(“”) +
ggtitle(“Cities in Volume 1”)
## this function is from http://kanchengzxdfgcv.blogspot.tw/2016/11/r-ggplot2.html##
multiplot <- function(…, plotlist=NULL, file, cols=1, layout=NULL) {
library(grid)
plots <- c(list(…), plotlist)
numPlots = length(plots)
if (is.null(layout)) {
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}

if (numPlots==1) {
print(plots[[1]])

} else {
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
for (i in 1:numPlots) {
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}
## provinces in both##
multiplot(vol1bp, vol2bp, vol3bp, cols= 1)

## cities in both##
multiplot(vol1bc, vol2bc, vol3bc, cols =1)

Comparison of Manchu and Chinese versions

There are two versions of the draft of Ping Ding Hai Kou Fang Lue: the Manchu and Chinese versions. If the Manchu and Chinese versions were translated from each other, the two versions should be exactly the same. However, as known, they are different. How different are they? The Manchu language and Chinese are linguistically different, so it is impossible to analyze grammar, sentence structure, and writing style. However, it is possible to analyze proper nouns. In this text, there are three primary proper nouns: toponyms, people’s names, and position titles. I thus analyze the difference of percentage between two versions in proper nouns, including places, people’s name, and position titles, and Dunning log-likehood as well as tf-idf of six overlapping places in all three volumes. By understanding the result, I can check the text to deeply recognize the difference between two versions.

Graph 1: Percentage of Manchu minus percentage of Chinese text mines in Vol. 1

topoynm_vol1

Graph 2: Percentage of Manchu minus percentage of Chinese text mines in Vol. 2

topoynm_vol2

Graph 3: Percentage of Manchu minus percentage of Chinese text mines in Vol. 3

topoynm_vol3

Graph 4: Percentage of Manchu minus percentage of Chinese text mines in three volumes

topoynm_all

Graph 1 to Graph 4 represent the difference that the percentage of text mines in places in Manchu language minus the text mines in places in Chinese language from Volume 1 to Volume 3 and all three volumes. Graph 1 suggests that Fujian is much more frequent in Manchu than in Chinese. Dutch is more frequent in Chinese than in Manchu. Graph 2 shows that Fujian, Penghu, and Haitan are more frequent in Manchu than in Chinese. Additionally, Xiamen and Meizhou are more frequent in Chinese than in Manchu. Graph 3 suggests that Taiwan is much more frequent in Manchu than in Chinese. By contrast, Penghu is more frequent in Chinese than in Manchu. Overall, Fujian and Taiwan are more frequent in Manchu than in Chinese, and Dinghai, Dutch, and Pinghai are slightly more frequent in Chinese than in Manchu. Among these places, Fujian is easily to explain. In Chinese, each province has its own abbreviation; for example, Min is the abbreviation of Fujian. The percentage of frequency of Dutch in Chinese volume 1 is more different than in Manchu volume 1 because there is one paragraph, which accounts Dutch navy supported the Qing, describes differently in two versions.

Graph 5: Percentage of Manchu minus percentage of Chinese text mines in Vol. 1

person_vol1

Graph 6: Percentage of Manchu minus percentage of Chinese text mines in Vol. 2

person_vol2

Graph 7: Percentage of Manchu minus percentage of Chinese text mines in Vol. 3

person_vol3

Graph 8: Percentage of Manchu minus percentage of Chinese text mines in three volumes

person_all

By using the similar approach to analyze people’s name, it can obtain the result of difference between two versions. However, in this part, I make a slight change. Instead of using entire Chinese name, combined by first and last name, I only search people’s first name because it is more common to use first name only in text. More importantly, because certain surname, such as Wang, also refers a noble rank in Chinese and Manchu, it caused to confused once I analyzed entire Chinese name. Therefore, in order to avoid the misunderstanding, it is appropriate to only search first name. Graph 5 suggests that some people’s names in Manchu never appear in Chines, and people who are more frequently mentioned in Manchu are Manchu people. Oppositely, Wan Zhengse, a military commander, appear more frequent in Chinese than in Manchu. Graph 6 also suggests the difference between two texts. Again, Wan Zhengse is still more frequent mentioned in Chinese than in Manchu, and people who are more frequently mentioned in Manchu are Manchu people. Interestingly, Graph 7 shows the similar result that people who are more frequently mentioned in Manchu are Manchu people. Noticeably, Manchu people are more frequently mentioned in Manchu, and Chinese people, including Hanjun Bannersmen and Han Chinese, are more frequently mentioned in Chinese.

Graph 9: Percentage of Manchu minus percentage of Chinese text mines in Vol. 1

title_vol1

Graph 10: Percentage of Manchu minus percentage of Chinese text mines in Vol. 2

title_vol2

Graph 11: Percentage of Manchu minus percentage of Chinese text mines in Vol. 3

title_vol3

Graph 12: Percentage of Manchu minus percentage of Chinese text mines in three volumes

title_all

Finally, using the same process to analyze the position title, such as governors (dzungdu), commanders (tidu), and generals (jiangjun), Graph 9, Graph 10, Graph 11, and Graph 12 show that viceory, marshal, and general are more frequently mentioned in Manchu than Chinese. The main reason is because these three terms are able to be replaced by other abbreviations in Chinese. Usually, in Chinese, authors prefer to use the abbreviations to refer these position titles. However, this also points out that Manchu language text indicated precisely and directly.

The first analysis about prop nouns is regarding places. Drawing the results of Graph 1 to Graph 4 on map can provide precise visual sense. Graph 13 to 16 shows the result. In some degree, Graph 13 to Graph 15 display shift over time, and Graph 16 shows the completed shift over time showing in three volumes.

Graph 13: the percentage difference in volume 1

topoynm_map_vol1

Graph 14: the percentage difference in volume 2

topoynm_map_vol2

Graph 15: the percentage difference in volume 3

topoynm_map_vol3

Graph 16: the percentage difference in three volumes

topoynm_map_all

However, mapping statistic results is questionable. In order to provide more precise result, two methods can be used: Dunning’s log-likehood and tf-idf. Dunning’s log-likhood offers an efficient approach to compare two texts. When the value of Dunning’s log-likhood (G2) is 15.13, the significance vale of p is less than 0.0001 (p<0.0001). Then, when G2 is 10.83, p is less than 0.001. When G2 is 6.63, p is less than 0.01. When G2 is 3.84, p is less than 0.05. As a result, Table 1 suggests that Fujian shows the significant difference in the first three volumes, but Zhejiang, Taiwan, Xiamen, Jinmen, and Haicheng were similar based on the statistic method.

Table 1: The difference of the six overlapping places in the first three volumes by using Dunning’s log-likehood. The analysis text is the Manchu volumes, and the reference text is the Chinese volumes.

Place Volume 1 Volume 2 Volume 3
Fujian 15.491 4.5189 5.764
Zhejiang 0.009 1.130 1.590
Taiwan 0.009 0.052 0.498
Xiamen 0.476 0.891 0.384
Jinmen 0.294 0.269 0.384
Haicheng 0.072 0.154 0.128

As analysis above, these six places show difference changes. For Fujian, the difference was less and less significant, but it was still the most different in the first three volumes. Conversely, Zhejiang and Taiwan became more and more different although it was not significant difference based on the Dunning’s log-likhood. Xiamen, Jinmen, and Haicheng did not suggest the significant difference in the first three volumes.

Besides the Dunning’s log-likehood, another effective approach of text mining is tf-idf (term frequency–inverse document frequency). By using the tf-idf approach, the value of six places in these two language versions are showed in Table 3.

Table 3: tf-idf in three volumes in Manchu and Chinese.

Place Volume 1 Volume 2 Volume 3
tf idf Tf-idf tf idf Tf-idf tf idf Tf-idf
Fujian M 0.487 0.720 0.350 0.2 1.610 0.322 0.363 1.012 0.368
C 0.571 0.560 0.320 0.2 1.610 0.322 0.375 0.981 0.368
Zhejiang M 0.128 2.054 0.263 0.05 2.996 0.150 0.045 3.091 0.141
C 0.071 2.640 0.189 0.05 2.996 0.150 0.063 2.772 0.173
Taiwan M 0.064 2.747 0.176 0.225 1.492 0.336 0.432 0.840 0.054
C 0.071 2.640 0.189 0.2 1.609 0.322 0.344 1.068 0.076
Xiamen M 0.167 1.792 0.299 0.25 1.386 0.347 0.068 2.686 0.183
C 0.143 1.946 0.278 0.275 1.291 0.355 0.094 2.367 0.222
Jinmen M 0.141 1.959 0.276 0.175 1.743 0.305 0.068 2.656 0.183
C 0.125 2.079 0.260 0.175 1.743 0.305 0.094 2.367 0.222
Haicheng M 0.013 4.357 0.056 0.1 2.303 0.230 0.023 3.784 0.086
C 0.018 4.025 0.072 0.1 2.303 0.230 0.031 3.466 0.108

What can the statistics show? The statistics can at least tell readers two facts. First, as mentioned above, categorizing them in two clusters, the large scale cluster including Fujian, Zhejiang, and Taiwan, shows that they became increasing different. Additionally, the Manchu version might describe large places more precise than Chinese version did, but both equally described the city or small scale places.

Why did Fujian decrease its difference over time? Comparing to the comparison of city scale, the government focuses on cities because the war between the Qing and Zheng had become locally. This fully explains why a lot of cities, towns, and villages appeared in the second volume. As a result, either Chinese or Manchu recorded the similar tendency because they were probably written based on the same sources.

According to the analyses, a lot of differences are obvious. For example, Manchu people are more frequently mentioned in Manchu version; by contrast, Han Chinese are more frequently mentioned in Chinese version. Additionally, by analyzing Dunning’s log-likehood and tf-idf, the six overlapping places in all three volume suggests that the importance of places change over time. Although the two versions are basically similar in their structures and archives, they are significantly different. Consequently, the Manchu and Chinese versions are not translated from each other. After comparing three major proper nouns– place, person, and position, it suggests that the Manchu version is more precise than Chinese.

Does the Manchu matter? The Comparison of Ping Ding Hai Kou Fang Lue in Chinese and Manchus

1.   Introduction, Historiography, and Methodology

Is the Manchu language source merely the copy of Chinese source? Does the Manchu language source matter for studying Qing history? The question has been debated for over one century. In this article, I propose to argue that the Manchu language source not only matters but also is at least equally important as Chinese sources.

The oral Manchu language was used by Northeastern China, as known as Manchuria. In 1587, Nurgaci established a regime, and became khan of this area in 1589. In 1616, Nurgaci created a national title, Jin. During this period, because the government requested a more systematic writing so as to enhance the political efficiency, Erdeni and G’agai created the Manchu language based on the Mongolian linguistic system. This Manchu language writing system had limitation to spell non-Manchu language names or places, and, the most importantly, this writing system could not distinguish the sound k, g, and h. Comparing to the later revised Manchu language, this writing system was called the Old Manchu language.

In 1632, Hong Taiji, Nurgaci’s son, asked Dahai to modify the Old Manchu language. The new writing system included ten new words in order to spell names and places, clarified the difference between k, g, and h, and standardized the writing system. Therefore, for about 30 years, the Manchu language was mature enough to become a standard language to use. When the Qing occupied China, the Manchu language became the official language for all regions within the empire, including China, Mongolia, Tibet, and Uyghur until 1911.

In the early 20th century, Japanese scholars had noticed the importance and specialty of the Manchu language. Using the Manchu language sources to study Qing history had become more and more important in Japan. On the contrary, in China, although some scholars understand the Manchu language, using the Manchu language sources to study Qing history did not become a primary research approach at all. There are at least three main reasons.

First, because of Sinization, a lot of scholars did not pay attention on the Manchu language. For these scholars, in the same document, the Manchu language part was just translated from the Chinese part. Second, the amount of Chinese sources is the way more than the amount of the Manchu language sources. As a result, it is not necessary to read the Manchu language. Third, for them, the Manchu language was likely less important after the High Qing, and, meanwhile, ministers’ capacity of using Manchu language had gradually disappeared. As a result, because of these three reasons, the Manchu language sources had not been emphasized for a long time.

In 2004, a new historiographic approach appeared. This historiographic approach is called the New Qing History or the New Qing Imperial History. Over all, the New Qing History proposes to understand Qing history based on three new concepts. First, the New Qing History refuses the Sino-centrism, but, must be clarified, the New Qing History also does not entirely ignore the importance of Sinization. Instead of Sinicization, the New Qing History emphasizes the Manchu elements of the Qing Empire. Second, since the Qing Empire was not a Sinicized empire, the Qing Empire must have its unique. In this context, the New Qing History notices that the Qing Empire was in fact an empire as same as other empires in early modern period, such as the British Empire, Russia Empire, and Ottoman Empire. In other words, the Qing Empire was not a Chinese Empire but a universal empire, and China was just a part of this empire. Third, since the New Qing History emphasizes the importance of the Manchu element, the most direct approach to engage with the Manchus is widely using the Manchu language sources. For the New Qing historians, the Manchu language source is independent instead of a translation copy of Chinese part. Admittedly, the New Qing History generates considerable meaningful results and works, but increasing opponents still judge the three main concepts. One of the most common comments is that the New Qing historians overemphasize the importance of the Manchu language sources in an exaggerative way.

Based on this historiographic debate, this article analyzes a text in Chinese and the Manchu language. The text is Ping Ding Hai Kou Fang Lue (the Book of Strategic Record about Suppressing the Pirate, 平定海寇方略). Fang Lue was a literal form in the Qing period, and this form was only used by the government. When the Qing Empire defeated an enemy, the government edited a book for recording every detail chronologically based on official archives. Because Fang Lue was not only a book recording historical events but also a book proclaiming imperial victory, authority, and prestige, it is reasonable that the book should be edited in to multiple languages. So far, as we known, there were 25 Fang Lue. Among these 25 Fang Lue, Ping Ding Hai Lou Fang Lue was the only one which had not been found the completed version. In the past century, the Chinese version of this Lang Lue was the only version. Noticeably, this Chinese version was just a draft with four volumes. In 2011, I discovered the Manchu language version in the Grand Council Archive. This Manchu language version was also a draft, and it only had the first three volumes. Even though the Manchu volume only included the first three volumes, the Chinese version and the Manchu language version were still comparable because of three reasons. First, they overlapped the first three volumes. Second, they were edited at the same time. Third, they recorded the same event. Therefore, by comparing these two texts, this article seeks the relationship between the Chinese and the Manchu language versions.

As can be seen in Table 1, the Manchu and Chinese texts cover the exactly same period. In other words, these two texts record same events. In fact, this makes sense. Since the main purpose of this book is to record history and proclaim imperial prestige, the two texts should therefore have the same content. However, since the two texts should be in literal the same, it is interesting if there is any tiny difference.

Table 1: The period covered in the first three volumes

  Time Chinese source Manchu language source
Volume 1 Beginning March 1679 March 1679
End December 1679 December 1679
Volume 2 Beginning March 1680 March 1680
End August 1680 August 1680
Volume 3 Beginning March 1681 March 1681
End November 1682 November 1682

 

This article uses digital analysis to do text mining. The first problem encountered is the difference between two languages in grammar, writing system, and meaning. Because Chinese and the Manchu language are linguistically different, it is difficult, or impossible, to compare words by words. Fortunately, as mentioned above, since the two texts records the same events based on the same sources during the same time, the amount of the proper nouns and the name of places had to be matched. As a result, I propose to compare the amount of the name of places in two texts to see whether the two texts were translated or copied from the other. Then, I seek to individually map the name of places mentioned in two texts, and, by combining the geographic, political, and environmental phenomenon, I try to look for a big picture regarding the difference of the two texts.

2.   The Comparison of Two Texts

Table 2 suggests that, besides the term of “Dutch,” the rest name of places appeared more frequent in the Manchu language than in Chinese sources in the volume 1. It is hard to say whether the Manchu language text is more precise than Chinese text. However, this suggests that the Manchu language text and Chinese text are different. Table 3 suggests that the frequency of name of places in the Manchu language text is more than in the Chinese text. Nevertheless, the frequency of Kimmen, Nan’ao, Pinghai, and Tongshan are the same in both language texts. As a result, the two texts are different.

Table 2: The Frequency of the name of places in the Volume 1

Order Name of places Manchu texts Frequency Chinese texts Frequency
1 Fujian fugiyan 38 福建 20
2 Xiamen hiya men 14 廈門 8
3 Kimmen gin men 11 金門 7
4 Dutch ho lan 11 荷蘭 11
5 Tingzhou ting jeo 8 汀州 2
6 Taiwan tai wan 5 臺灣 4
7 Zhangzhou jang jeo 5 漳州 3
8 Youzhou yo jeo 5 岳州 5
9 Chaozhou coo jeo 5 潮州 4
10 Quanzhou ciowan jeo 4 泉州 2

Table 3: The Frequency of the name of places in the Volume 2

Order Name of places Manchu texts Frequency Chinese texts Frequency
1 Haitan hai tan 15 海壇 12
4 Xiamen hiya men 10 廈門 11
5 Taiwan tai wan 9 臺灣 8
9 Fujian fugiyan 8 福建 4
3 Kimmen gin men 7 金門 7
8 Penghu peng hū 6 彭湖 3
2 Haicheng hai ceng 4 海澄 4
6 Nan’ao nan oo 3 南澳 3
7 Pinghai ping hai 3 平海 3
10 Tongshan tung šan 3 銅山 3

Comparing to the previous two volumes, Table 4 shows a different result. Besides the Taiwan, Fujian, and Penghu, the rest of frequency is the same. However, it is apparent that frequencies of Taiwan and Fujian in the Manchu language text are more than in Chinese. Although the texts in the Manchu language and Chinese are slightly different, in terms of name of places, most of them are the same in the volume 3.

Table 4: The Frequency of the name of places in the Volume 3

Order Name of places Manchu texts Frequency Chinese texts Frequency
1 Taiwan tai wan 19 臺灣 11
3 Fujian fugiyan 16 福建 12
2 Penghu peng hū 8 彭湖 7
4 Kimmen gin men 3 金門 3
5 Xiamen hiya men 3 廈門 3
7 Zhejiang jegiyang 2 浙江 2
8 Pingyang ping yang 2 平陽 2
6 Haicheng hai ceng 1 海澄 1
9 Tongshan tung šan 1 銅山 1
10 Yungxia yūn siyoo 1 雲霄 1

Since I have compared the frequency of name of places in the first three volumes, it is obviously that the two texts are different. Although the difference is slight, they are different. Therefore, the Manchu language text or Chinese text are not the translated or copy version from the other. Using diagram is an appropriate approach to see how different the two texts are.

Graph 1: The line-graph of the difference in the Volume 1
20160929_blog_graphs_1
Graph 2: The line-graph of the difference in the Volume 2

20160929_blog_graphs_2

Graph 3: The line-graph of the difference in the Volume 3

20160929_blog_graphs_3

As can be seen, Graph 1, Graph 2, and Graph 3 suggest that the two texts are different in the most frequent name of places. In other words, the more frequent places are mentioned in text, the more different they are. When a place where are mentioned only few times in either text, this suggests that this place was only becoming significant at a certain moment or event during this period. For example, in the volume 2, Nan’ao, Pinghai, and Tongshan were mentioned only because a minister listed certain places where should be garrisoned. Besides this suggestion, these places were not important; to be specific, they should not be mentioned because they were not even the territory of the Qing Empire due to the Coastal Exclusion Policy. I will discuss this in the next section. Therefore, once the places were mentioned more frequent in the texts, they were highly different. In other words, I can confidently conclude that the two texts are different in terms of the frequency of name of places, even though they recorded the exactly the same period and event.

3.   A big picture of the geographical phenonmenon

In the previous section, I have left a question that the less frequent name of places should not appear due to the Coastal Exclusion Policy, but why were they still mentioned in two texts? This question might be able to answer when I incorporate the text mining with mapping together. According to the texts, the first sentence of the volume three addresses an important event. In the second month of the twentieth year of Kangxi Reign Period, the Qing Empire decided to repeal the Coastal Exclusion Policy. In other words, the lands in coastal area abolished due to the Policy could be used by people and government. However, this policy in fact was not successful because a lot of people still returned to their hometown before the policy repealed. This was widely known in Fujian but not in other regions.

In other words, the records in the volume 1 and 2 were the events when the Coastal Exclusion Policy was processed. However, the volume 3 was the record after the Coastal Exclusion Policy just repealed. Therefore, I propose to combine the result of the volume 1 and 2 as one fact but keep the volume 3 as an individual fact to discuss the difference between two texts under the historical phenomenon.

As can be seen in Map 1, between the frontier of blue points and seashore, the coastal area was entirely abandoned by the Qing Empire, so the area was in literal not a part of the empire. Therefore, when I mix the result of text mining and the mapping, this might help to understand history well.

Map 1: The Coastal Exclusion Policy

20160929_blog_cep

Map 2 is drawn by combining Map 1 and the result of Table 2, but I erase the large unit of place, such Fujian and Dutch, because I could not identify them in the map. As can been seen, the cities mentioned in text were almost beyond the front line, besides one point, which was Youzhou. In other words, from 1679 and 1680, the most frequent discussion about places located on the area where was belonged to neither the Qing nor the Zheng. By using the similar approach, Map 3 shows the result of Table 3 in the map.

Map 2: The frequency of places in the Manchu language in the volume 1 and 2

20160929_blog_volume12_m

Map 3: The frequency of places in Chinese in the volume 1 and 2

20160929_blog_volume12_c

Combining Map 1, Map 2, and Map 3, we could gain Map 4. It is interesting the difference between the Manchu language and Chinese sources in the map. Since the Manchu language mentioned these areas, where were not a part of the Qing, more direct than in Chinese, this is probably meaningful. Considering the feature and audience of the Manchu language, the Qing government probably did not allow Chinese general public, who could easily access to Chinese but the Manchu language, to understand details of the failure of the Coastal Exclusion Policy. In other words, this difference might imply how the empire control people’s mind and recognition of the true history.

Map 4: The frequency of places in the Manchu language and Chinese in the volume 1 and 2 under the map of the Coastal Exclusion Policy

20160929_blog_volume12_b

What was happened and changed when the Coastal Exclusion Policy was repealed? In fact, although the government prohibited people to return these abandoned areas, increasing people still returned where they settled before the policy processed. As a result, the policy was in reality useless. When the policy was repealed in 1681, people could return their original hometowns and lands. According to the previous discussion, if it is true that the reason to mention cities in abandoned area in Chinese less frequent and direct than in the Manchu language is because the government attempted to control people’s understanding, Map 5, Map 6, and Map 7 could exactly interpret why the two texts are similar in the volume 3. Because the Coastal Exclusion Policy had been repealed, it was not necessary to hide from anything about the fail of Coastal Exclusion Policy.

Map 5: The frequency of places in the Manchu language in the volume 3

20160929_blog_volume3_m

Map 6: The frequency of places in Chinese in the volume 3

20160929_blog_volume3_c

Map 7: The frequency of places in the Manchu language and Chinese under the repealed Coastal Exclusion Policy

20160929_blog_volume3_b

Map 8 is mixed Map 1 to 7. It might suggest and support my argument in previous paragraph. Therefore, I can certainly be confident to argue that the Manchu language was more precise, detailed, and direct to mentioned the name of places than in Chinese because the government did not reveal the failure of processing the Coastal Exclusion Policy. Although the failure of the Coastal Exclusion Policy was widely known in Fujian, it was not recognized in other provinces and non-China regions, such as Mongolia and Tibet. Because the main purpose of editing this book is to proclaim the imperial prestige and success, the government had to carefully control the content. The threshold of learning the Manchu language was higher than learning Chinese because the Manchu language was only used in high class. In contrast with the Manchu language, Chinese had been the dominant language for over two thousand years. The failure of the Coastal Exclusion Policy could be limitedly recognized by ruling class, but this could not be known by Chinese folks.

Map 8: The frequency of places in the Manchu language and Chinese under the Coastal Exclusion Policy in the first three volumes

20160929_blog_volume123

4.   Conclusion

According to the approach of the digitial humanities, conducting text mining to compare two different languages of the same book suggests that the Manchu language or Chinese text was not the copy or translation version of the other. Moreover, the frequency of places in the Manchu language is slightly more precise than Chinese version. Moreover, because the historical background, the frequency of places in this book might be highly related to the imperial policy, the Coastal Exclusion Policy. In fact, combining the text mining and spatial history, it shows how the government controlled texts to limit folks to recognize the failure of the Coastal Exclusion Policy.

Admittedly, I can read the Manchu language and Chinese. Frankly, before I used the approach of the digital humanities to analyze these two texts, I believe that the two texts in fact were exactly the same although I’m a follower of the New Qing History, which means that I did not believe the Manchu language sources were translated from Chinese sources. However, in this case, for me, there was probably a main draft or main author, and the two texts were just edited from the main draft. However, because of the difference between the frequency of places, I change my mind. Also, this enhances the idea of the New Qing History: the Manchu language and Chinese sources should be equally emphasized in order to establish a broader Qing history.

 

 

 

Does the Manchu language matter?

Introduction

Do you still remember the text in the standard Manchu language, which is Ping Ding Hai Kou Fang Lui (The Book about Defeating Piracy, 平定海寇方略)? In this blog, I propose to briefly explain the background of editing this book, and I analyze and compare within this book. The most importantly, I analyze and compare the version of this book in two languages, Chinese and the Manchu language. By understanding this analysis, I argue that the Manchu language texts and Chinese texts are different and equally important to know.

During the Qing China (1644-1911), the Qing Empire had a tradition on editing book for detailing victory, and the form of this kind of books is “Fang Lue” in Chinese and “necihiyeme toktobuha bodogon i bithe” in the Manchu language. The main function of Fang Lue was for proclaiming how powerful and successful the Qing Empire was. In order to widely spread the success of the Qing Empire, Fang Lue usually edited in the Manchu language and Chinese, sometimes in other languages, such as the Mongolian.

Ping Ding Hai Kou Fang Lue was edited for recording the battle between the Qing Empire and the Zheng Regime in Taiwan, which was regarded as pirate for the Qing. The Zheng Regime was formally created by Zheng Chenggong, as known as Koxinga, during Ming Qing transition. However, Koxinga’s father, Zheng Zhilong, was the substantial founder of this regime in the later Ming Dynasty. Zhilong was originally a pirate as well as a trader, but he was recruited by the Ming government as an official general in Fujian, a southeastern province of China, so as to help the Ming Court to suppress other pirate in 1627.

After few years, in 1635, Zhilong successfully defeated the last resister. Due to Zhilong’s contribution during these years, Zhilong had been appointed as the commander of Fujian. Zhilong became the practical controller in Fujian. During Ming Qing transition, although Zhilong supported the Ming Court at the beginning, Zhilong eventually decided to surrender to the Qing Empire, but he did not bring all troops and property with him to Beijing.

Instead, Zhilong’s brothers and sons were still in Fujian with holding unbelievably powerful army and navy. Koxinga, Zhilong’s eldest son, was not the most powerful general in the Zheng Regime at this time, but, as a half Japanese and trained as a Japanese samurai and a Chinese Confucianist, Koxinga gradually nibbled up his relative’s troops and annexed their territory to enhance his power. Around 1650s, Koxinga had not only dominated the Zheng Regime but also become the most influential and powerful anti-Qing power in China.

However, in 1660, Koxinga misapprehended his capacity, so he attacked Nanjing City beside Yangzi River. Undoubtedly, he failed because of Koxinga’s arrogance and misstep. Next year, he led his navy and army to Taiwan. After one-year battle with the Dutch East India Company, Koxinga accepted Dutch’s surrender, and the Zheng Regime began to reign Taiwan as an anti-Qing basis. From 1661 to 1683, the Qing Empire and the Zheng Regime negotiated with each other to intend to find a balance to keep peaceful sphere. However, they never reached an agreement.

In 1683, Shi Lang, the former general of the Zheng Regime and the navy marshal of the Qing Empire at this time, defeated the Zheng Regime. As a result, Zheng Keshuang, the last king of the Zheng Regime, surrendered to the Qing Empire. This event was extremely important for the Qing Empire. First, the last anti-Qing power eventually vanished. Second, the Qing Empire occupied a new territory as its colony. Third, the Qing Empire could focus on the threat from the Inner Asia. This was the reason why this battle was worth to record as a Fang Lue.

The Ping Ding Hai Kou Fang Lue’s Manchu language version

There are 25 Fang Lues officially edited by the Qing Empire, and the form of Fang Lue is edited by chronological. However, among them, the Ping Ding Hai Kou Fang Lue was the only one which was not found the formal version in Chinese. In other words, it was a draft. For the past century, this version was the only one recognized, which had four volumes. In 2011, I’m the first person to discover the draft in the Manchu language although there were only first three volumes remaining.

First of all, I propose to compare the first and second volumes. As can be seen in Table 1, I list the frequent words in the volume 1 but not in the volume 2. Obviously, almost all frequent words in the volume 1 but not in the volume 2 are name of people or place. For example, the first is Fujian, which was the name of a province in southeastern China. Moreover, the second frequent word is wang, which refers to king. In other words, kings were not important in the volume 2. Additionally, ceng and gung refer to the same person, who is Koxinga, and jy and lung refer to Koxinga’s father, Zhilong. In other words, these two important people are not important in the volume two. The reason of less frequent names and places is because this Fang Lue was edited chronologically, so these places or people in the period described in the volume 2 are no longer essential.

Additionally, another noticeable difference between two volumes is that there are a lot of terms regarding the emperor, such as hese, dergi, hesei, and wasimbuhagge. Does this indicate that emperor is less important in the volume 2? Yes, it does. In fact, this perhaps addresses that the content of the volume 1 records the emperor’s orders, but the content of the volume 2 mainly records the discussion between ministers and generals as well as the battle between the Qing and the Zheng.

Table 1: comparing the difference in the first and second volume.

Order Words English meaning Frequency in Vol. 1 Frequency in Vol. 2
1 fugiyan Fujian 38 8
2 wang king/surname 36 3
3 ni of 34 7
4 gung (name of a person) 33 0
5 ceng (name of a person) 27 4
6 hese emperor’s order 23 7
7 manggi when… 23 7
8 aniya year 22 2
9 jy (name of a person) 22 0
10 lung (name of a person) 20 0
11 hebei discussion’s 19 1
12 sede speak 17 0
13 dergi east/up/Majesty 16 6
14 hesei of the emperor’s order 16 5
15 wasimbuhangge the order from emperor 16 3

Next, I compare the frequent words in the volume 1 and also in the volume 2.  As can been seen in Table 2. Besides the most frequent auxiliary words, the most frequent words usually referred to certain important people or place in both volumes, such as Wan Zhengse (wan, jeng, and še in the Manchu language), the most important general (tidu) during this period, and Quanzhou (cuwan jeo in the Manchu language), the most important area in Fujian.

Table 2: comparing the similarity in the first and second volume.

Order Words English meaning Frequency in Vol. 1 Frequency in Vol. 2
1 be be 242 126
2 de at 131 59
3 i of 127 67
4 jeng (surname) 81 23
5 cooha military/army 80 53
6 cuwan (name of a place) 49 21
7 mederi ocean 44 11
8 seme so/although 41 21
9 jeo prefecture 38 11
10 hūlha bandit 36 19
11 sehe spoke 28 14
12 wan (surname) 27 28
13 men (name of places) 25 19
14 tidu commander 25 24
15 fu (administrative level) 24 19
16 amba big 23 12
17 gin (name of a place) 22 11
18 še (name of a person) 22 19
19 dzungdu viceroy 21 16
20 dahame therefore 20 14

Table 3 suggests that the most frequent words in the volume 2 but not in the volume 3. Apparently, besides numbers (minggan, emu, juwe, and ilan) and gaimbi in different forms (gaifi and gaiha), the rest words are related to name of people or place. The question here is why gaimbi, referring to “get” in English, appears frequently. According to the content of the second volume, it primarily accounts the battle between two regimes, so it makes sense because gaimbi also refers to “occupy city” in English. As a result, the volume 2 in fact discusses how the cities in Fujian were occupied by turns.

Table 3: comparing the difference in the second and third volume

Order Words English meaning Frequency in Vol. 2 Frequency in Vol. 3
1 hai (name of a place) 26 1
2 men (name of places) 19 6
3 še (name of a person) 19 3
4 minggan thousand 15 0
5 tan (name of a place) 15 0
6 gaifi gotten 14 7
7 juwe Two 14 4
8 ilan three 13 3
9 emu one 12 6
10 gaiha got 11 0
11 gin (name of a place) 11 7
12 jeo prefecture 11 0
13 hafan officials 10 4
14 hiya guard 10 3
15 se etc. 10 7

Table 4 suggests the most similar words. Besides the auxiliary words, over half of the most frequent words in both volumes refers to name of place or people. However, noticeably, the surname, such as jeng and u is often the most frequent in both volumes. This actually indicates that in the Manchu language version, the author preferred to write entire name instead of only first name. This is in fact very different from the Chinese version, whose author preferred to write only first name.

Table 4: comparing the similarity in the second and third volume

Order Words English meaning Frequency in Vol. 2 Frequency in Vol. 3
1 be be 126 139
2 i of 67 54
3 de at 59 60
4 cooha military/army 53 38
5 wan (surname) 28 23
6 tidu commander 24 22
7 jeng (surname) 23 8
8 cuwan (name of a place) 21 14
9 seme so/although 21 22
10 amban minister 19 13
11 fu (administrative level) 19 16
12 hūlha bandit 19 16
13 u (surname) 19 8
14 siyūn governor 17 16
15 dzungdu viceroy 16 26
16 hing (name of a person) 15 9
17 dahame therefore 14 18
18 dzu (name of a person) 14 8
19 sehe spoke 14 22
20 gemu together 13 9

The comparison within this book suggests that each volume has its own emphasis because this book was edited chronologically. Especially, the similarity was usually about grammar and certain important places or people. Since the content of this book was edited chronologically, the difference implied where is much more important, who is much more important, and what is much more important for different periods.

The Comparison of the same text in the different language

As mentioned, for over one century, the Chinese version was the only recognized one. Since the new version in the Manchu language has been discovered, it is important to compare two versions.

However, noticeably, Chinese is hard to analyze as a systematical language. Since Chinese is an alphabetic system of writing, each Chinese character might have multiple meanings and multiple Chinese combined together will generate different meanings. Due to these features of Chinese characters, I would like to use a different way to analyze and compare two texts. First, I analyze the text in the Manchu language to recognize the frequency of each words. Then, I search the top 20 frequent words in Chinese version to see whether the frequency is similar. As a result, let’s search the most frequent words in Volume 1, 2, and 3 in the Manchu language version, and check out the frequency in the Chinese text.

Table 7: the comparison of the frequency of words in the volume 1

order Words Frequency English Chinese Frequency in Chinese version
1 be 242 be
2 de 131 at
3 i 127 of
4 jeng 81 (surname) 3
5 cooha 80 military/army 軍/兵 軍25/兵51
6 cuwan 49 (name of a place) 2
7 mederi 44 ocean 46
8 seme 41 so/ although
9 fugiyan 38 Fujian 福建 20
10 jeo 38 Prefecture 12
11 hūlha 36 bandit 賊/寇 賊17/寇20
12 wang 36 king 22
13 ni 34 of
14 gung 33 (name of a person) 14
15 sehe 28 spoke

 

Graph 1: The comparison of the frequency of words in the volume 1 as a line graph

figure_1

As can be seen, besides the terms which could not be found in Chinese, such as be, de, and i, in Manchu language, jeng, which was the surname referring to Zheng (鄭) in Chinese, rarely appeared in the Manchu text. Meanwhile, in the Manchu text, cuwan, referring to Quanzhou (泉州) in Chinese,  frequently appeared, but this word only appeared twice in the Chinese text. Also, in the Manchu text, fugiuan, referring to Fujian (福建) in Chinese, was almost double times more than this term in Chinese.

Table 8: the comparison of the frequency of words in the volume 2

order Words Frequency English Chinese Frequency in Chinese version
1 be 126 be  
2 i 67 of  
3 de 59 at  
4 cooha 53 military 軍/兵 軍19/兵64
5 wan 28 (surname)/Taiwan 萬/灣 萬14/灣19
6 hai 26 (name of a place) 12
7 tidu 24 commander 提督 32
8 jeng 23 (surname) 6
9 cuwan 21 (name of a place) 2
10 seme 21 so/although
11 amban 19 minister 36
12 fu 19 (administrative level) 0
13 hūlha 19 bandit 賊/寇 賊29/寇13
14 men 19 (name of places) 24
15 še 19 (name of a person) 18

 

Figure 2: The comparison of the frequency of words in the volume 2 as a line graph

figure_2

According to Table 8 and Graph 2, similarly, jeng in the Manchu text is almost four times more than Zheng in the Chinese text. Also, cuwan, fu, and hai were more frequent in the Manchu text than in the Chinese text.

 

Table 9: the comparison of the frequency of words in the volume 3

order Words Frequency English Chinese Frequency in Chinese version
1 be 139 be
2 de 60 at
3 i 54 of
4 cooha 38 military/army 軍/兵 軍16/兵68
5 dzungdu 23 viceroy 總督 7
6 ki 23 (name of a person) 10
7 šeng 23 (name of a person) 10
8 wan 23 Taiwan 29
9 yoo 23 (surname) 6
10 sehe 22 spoke
11 seme 22 so/although
12 tidu 22 commander 提督 20
13 ši 19 (surname) 25
14 tai 19 Taiwan 29
15 dahame 18 therefore 3

Graph 3: The comparison of the frequency of words in the volume 3 as a line graph

figure_3

As can be seen, Table 9 and Graph 3 suggest that name of places or people were more complete in the Manchu text than Chinese text. This is also apparent in the volume 1 and volume 2.

The Manchu language and Chinese are extremely different languages. The Manchu language is belonged to Altaic language and syllabary, just like Japanese. Instead, Chinese (Mandarin) is belonged to Sino-Tibetan language and logogram. Therefore, it is hard to compare the frequency of each word in two texts. However, certain words, especially nouns, are still comparable.

This comparison is meaningful because this comparison is related to a debate between the New Qing History and its opponents. For a long time, Chinese sources have been the dominant sources to study Qing history. For these scholars, primarily the opponents of the New Qing History, the Qing Empire was not an empire; in the lieu of an empire, the Qing was entirely incorporated by Chinese culture and system, so the Qing was actually one of Chinese dynasties. This perspective was called Sinicization. In order to support their idea regarding Sinicization, they claimed that all texts written in the Manchu language was just the copy of the Chinese version, so the versions in the Manchu language were meaningless because scholars could directly read Chinese version.

Is this correct? Let’s look the new graphs, which are modified from Graph 1, 2, and 3. They are Graph 4, 5, and 6. The main difference between Graph 1, 2, 3 and Graph 4, 5, 6 is that I omit the term in the Manchu language but not in Chinese, for example auxiliary words. The reason is not because these terms do not exist in Chinese but they exist in the thousand possibilities in Chinese, so it is difficult to define which words in the Manchu language directly refer the words in Chinese; otherwise, I do a carefully reading.

Graph 4: the terms in both texts in volume 1

figure_1without-noncharacter

Graph 5: the terms in both texts in volume 2

figure_2without-noncharacterGraph 6: the terms in both texts in volume 3

figure_3without-noncharacter

Do you notice anything? The answer is quite obvious. Even though the same nouns, usually place or people’s name, appeared in both texts, their frequencies are still significantly different. Can the opponents of the New Qing History insist to claim that the Manchu language versions were just the copy of the Chinese version? I do not think so.

Conclusion

Admittedly, it is not sure whether this comparison is meaningful, but it does suggest a general idea. The idea is that the Manchu text was usually more precise than the Chinese text. However, in other words, Chinese can be more laconic. As a result, this might imply that the Manchu language was still less mature than Chinese, in some degree.

Apparently, there is a big question waiting for answering. Let’s look at Table 7, 8, and 9. Some terms, such as tidu, fugiyan, dzungdi, and so on, directly referred to a certain place or people. However, why were the number of these terms in the Manchu and Chinese texts different? According to the comparison and graphs, the Manchu language version and Chinese version were in effect different. Neither one was just the copy of another version. They were equally important but addressed to different audience and purpose.

Consequently, since this comparison had offered a general picture, the next step might be to do a closed reading to come up with the answer for the detail difference between the text in two languages.

How is the difference between the Old and standard Manchu language?

The Origin of the Manchu Language

Before I address the origin of the Manchu language, I must introduce a historiographic approach, which is the New Qing History. The New Qing History emphasizes the importance and element of Manchu in the Qing Empire (1616-1912) through reading non-Han Chinese sources, mainly in the Manchu language, to use the lens of the global history context. In other words, the New Qing historians regard the Qing Empire as an empire in the early modern period, just like the Ottoman Empire, British Empire, and so on. Apparently, in order to conduct study by using the methodology and idea of this historiography, it is necessary to understand the importance and evolution of the Manchu language.

Figure 1. The image of a document in the standard Manchu language (provided by Cheng-Heng Lu)

Zheng, Koxinga

Manchu language was the official language for the Qing Empire. This language was the most common languages for the Qing Empire. As a universal empire, the Qing Empire adapted to use different institutions to efficiently reign different areas, so using local language in the official documents in different areas expresses the essence of the Qing Empire. However, no matter what language was dominant, the Manchu language was written in parallel. For example, the empire used Chinese and Manchu language in China, used Mongolian language and the Manchu language in Mongolia, and used Tibetan and the Manchu language in Tibet. Generally speaking, in order to efficiently govern various regions, the Qing Empire endeavored to translate classics in different languages, such as Confucian classics from China, Buddhist classics from Tibet, and so on, in the Manchu language. Therefore, Manchu language was undoubtedly the most important and universal language in East Asia, just like Chinese, during the Qing period.

Literally, this language was created in 1599 based on the rule and characters of Mongolian language. During this period, this language was called the Old Manchu language because it was not mature and standardized. For example, the Old Manchu language could not spell Han Chinese name because the Old Manchu language had not established a complicated system to spell the sounds, which were not used colloquially. Additionally, the characters of “h,” “k,” and “g” were written exactly the same in the Old Manchu language. Moreover, the grammar was slightly not standardized. Since this regime was gradually growing, this immature language had to be revised. The significant turning point was in 1632 because Dahai, a literal doctor, was ordered to revise the Old Manchu language as the New Manchu language, as known as the Manchu language. After Dahai successfully revised the Old Manchu language as the standard Manchu language, which was widely used later, was mature enough to became the official language for the Court.

The Texts in the Old Manchu language and the Text in the Manchu language

The Old Manchu language was only used from 1599 to 1632, so there were few sources in the Old Manchu language, except Man Wen Lao Dang. Man Wen Lao Dang (滿文老檔, The Old Archive of the Manchu Language) was recorded daily events, including political, ethnic, economic, military, and social, before 1644 when the Manchu troops occupied Beijing to establish the Qing Dynasty, as one of the orthodox Chinese dynasties. Because Man Wen Lao Dang was the most primary source in the Old Manchu language, it becomes the most significant source to recognize the usage of the Old Manchu language.

When the Manchu army occupied Beijing and established the Qing Dynasty in China, increasing number of archives were written in Manchu language, as mentioned above. To be sure, Chinese was still the most important language, but, as mentioned, Manchu language was undoubtedly the official language. As a result, Manchu language was the only choice when the Court proposed to edit certain texts or books.

Ping Ding Hai Kou Fang Lue (平定海寇方略, The Book about Defeating Piracy), another text in this analysis, was edited around 1686. This book recorded how the Qing Empire suppressed and occupied Taiwan, where was reigned by the Zheng Regime (1661-1683). Since this was edited for claiming the victory and sovereignty of the Qing Empire, this was reasonable to compiling in the Manchu language so as to delivery to every corner among the empire. In this sense, the Manchu language version might be appropriate because the Manchu language was the shared language for different ethnicities and races. Meanwhile, the standard Manchu language had been used over half century while Ping Ding Hai Kou Fang Lue was edited. As a result, this book might be proper to analyze the linguistic usage of the Manchu language, comparing with Man Wen Lao Dang.

Overall, most importantly, this two texts are merely digital version in Manchu languages, the old and standard. Therefore, although the comparison of two texts might explore less important idea, this might be the first time that a study uses the methodology of digital humanity to conduct study regarding Manchu language sources.

The Analysis and Comparison of two texts

By using comparative and statistic method, these two texts surprisingly offered considerable interesting details. I individually analyze each text here.

Table 1. The frequency of words in Man Wen Lao Dang

Order Words in Manchu language Frequency Meaning in English
1 i 19374 of
2 de 17149 at
3 be 17115 is
4 emu 5049 one
5 han 4960 khan
6 niyalma 4692 people
7 seme 4568 (expletive)
8 juwe 3328 two
9 cooha 2803 military/army
10 juwan 2260 ten
11 weile 2095 affair
12 tere 2029 that/he/she
13 ilan 1883 three
14 gurun 1877 state/country
15 orin 1739 twenty
16 tanggū 1723 hundred
17 morin 1685 horse/the seventh character of Earth Branch
18 ni 1667 of (the previous word ending with “n”)
19 nikan 1576 Chinese
20 ere 1568 this

As can be seen in Table 1, except numbers, such as emu, juwe, and ilan, and auxiliary word, such as i, de, and be, the most frequent word is han. As said, han refers to khan. This was the official title before Manchu army invaded into China. In other words, during this period, the Qing Empire was slightly like a khanate instead of an empire. In fact, this makes sense because the Qing Empire became an “empire” after the second khan, Hong Taiji, defeated Mongolian army in 1635. In order to deeply understand the essence of the Qing Regime at this time, let’s focus on another term, hūwangdi, which refers to emperor, appears in this text only 38 times, and all of them referred to the emperor of the Ming Dynasty. As a result, the title of the leader of this regime in this period addressed that this regime was a khanate in lieu of an empire. Additionally, although scholars try to interpret that this regime was ruled by a tribal council which was organized by khan and other seven feudatories, belie, the frequency of belie in this text was 1516. Accordingly, for this regime, khan might play a much more important role than these feudatories.

Additionally, in Table 1, the frequencies of niyalma and nikan are hard to ignore. After a closed reading, niyalma is a general term to describe all people under the reign of this regime. However, nikan is particularly identifying Chinese. Why was not there a term about Manchu? In fact, Manchu, which was written as manju in this text, only appeared 131 times. To be sure, Manchu was created for uniting all ethnicities in Manchuria after Hong Taiji controlled Mongolia and came to the throne as an emperor after 1635. However, the frequency of nikan also indicates an important factor: Chinese were still the majority in this region. This might also explain why the Qing Empire had to establish the Hanjun Eight Banners System to assimilate Chinese into its ruling class.

Table 2. The frequency of words in Ping Ding Hai Kou Fang Lue

Order Words in Manchu language Frequency Meaning in English
1 be 499 is
2 i 249 of
3 de 238 at
4 cooha 164 Military/army
5 jeng 111 Zheng (surname)
6 cuwan 83 (referring to Quanzhou, name of a city)
7 seme 80 (expletive)
8 hūlha 76 Bandit/pirate
9 wan 75 Wan(surname)
10 mederi 74 Ocean/sea/marine
11 tidu 64 Commander
12 fu 59 City
13 fugiyan 57 Fujian (name of a province)
14 dzungdu 54 governor
15 ni 53 of (the previous word ending with “n”)
16 wang 50 king
17 men 49 Door (referring to certain name of place with this term, where usually means port)
18 jeo 48 prefecture
19 dahame 46 because
20 sehe 45 (completed tense)

As can be seen in Table 2, as mentioned above, except auxiliary word, such as i, de, and be, this text in fact really overturns present knowledge. Why could I make this argument? The most significant reason is because of the frequency of Wan. Wan is a surname, and this surname only referred to one general during this war: Wan Zhengse. In the past, scholars all acknowledged that Shi Lang was the most important person to defeat the Zheng Regime. However, in this text, Wan Zhengse was much more frequent mentioned because he was actually the general to organize and plan how to defeat the Zheng Regime although all credit was obtained by Shi Lang later.

Since this text concentrated on the war, it does make sense to mention considerable name of place. Among the top twenty frequent mentioned words, at least five words related to name of place. To be specific, Jeo referred to two places, Zhangzhou or Quanzhou. Fu also referred to Zhangzhou and Quanzhou. Unquestionably, cuwan only referred to Quanzhou. In other words, Quanzhou seems the most important place during this period. This is not surprised due to several reasons. First of all, Quanzhou was the most important city in southern Fujian. Second, Quanzhou was garrisoned by Fujian navy marshal, which was tidu. Third, Quanzhou was undoubtedly not only a city but the name of entire region. As a result, it can be concluded that Quanzhou was the most important area/city during this period.

Comparison of two texts

Admittedly, the scale of two texts are extremely different. The digital Man Wen Lao Dang is over 1,500 pages in a word file, but the digital Ping Ding Han Kou Fang Lue is just around 30 pages in a word file. However, according to statistic methodology, the frequency is still significantly remarkable.

As mentioned, these two texts were written in two “languages.” However, according to the statistics, the Old and standard Manchu language were actually similar because the auxiliary words were widely used in both. To be sure, two languages were not very different. Nevertheless, comparing two texts, it is easily to recognize the tense in two texts. In the Ping Ding Hai Kou Fang Lue, sehe, which is completed tense, frequently appeared because this text was edited after Taiwan had been already colonized by the Qing Empire. In contrast, Man Wen Lao Dang was recorded current dialogues or events reported by official immediately. As a result, the completed tense rarely appeared in Man Wen Lao Dang.

Comparing two texts, in fact, the Old Manchu language was not probably immature. In fact, the grammar in both texts were similar. For example, regarding verb, both texts contained past tense (-ha, -he), imperative mood (-kini), final form (-fi), conditional form (-ci), appositive form (-ra, -re, -ro), and perfective form (-habi). These verb forms were all appeared in Ping Ding Hai Kou Fang Lue as well. Therefore, the difference between the Old and standard Manchu language is probably not in grammar.

Conclusion

What could we learn from comparison of two texts? First, the grammar is still the same in either the Old Manchu language or Manchu language. In other words, the Manchu language, either the old or the standard, in fact has been a systematic and logical language. This could fully explain why this language could be widely utilized within the vast territory of the Qing Empire for over three hundred years.

Second, both texts focus on military because cooha was frequently emerging. To be sure, both texts discuss military events, especially Ping Ding Hai Kou Fang Lue. However, even though Man Wen Lao Dang recorded considerable military activities, this book should also describe something regarding administration or bureaucracy. However, it seems that military was still the most significant affair for this regime at that period.

Finally, because of the different purpose and content of two texts, they emphasized different terms. In Man Wen Lao Dang, numbers were everywhere because these numbers were used to record dates, years, and months. Instead, in Ping Ding Hai Kou Fang Lue, name of places was widely recorded because the geography was the essential point for this text.

Admittedly, comparing these two texts is not appropriate, in effect. However, this is due to the reality. Few sources in the Manchu language had been translated or Romanized into digital forms although some institutes have been conducted such works, such as Manchu Studies at Harvard University. Fortunately, these two texts were digitalized, and each of them represented different periods. Consequently, due to the manic tendency of the New Qing History in recent decade, the Manchu language is significantly emphasized. In order to use Manchus’ language to study Manchus’ history, it is necessary to widely use Manchu language as the primary source for studying Qing History. Once the amount of digital Manchu language sources appeared, it could help scholars to conduct Qing History through using digital methodology to offer more meaningful research.