Data visualization and exploratory data analysis can help us to a clearer understanding of the tozama/fudai distinction. The analysis below relies on data from the early 1870s, compiled by the Meiji government as part of its replacement of domains with prefectures (for details see Ravina 1999). Figure 1 uses violin plots to compare the two types of domains by kusadaka, the annual harvest yield of the domains measured in koku of rice. The tozama distribution reveals a long tail, which skews the mean well above the median and the mode, but the lower halves of the tozama and fudai distributions are similar.
Figure 2 uses an annotated box plot to detail the outliers (extreme values). Notably, the four major domains active in the overthrow of the shogunate (Satsuma, Chōshū, Tosa, and Saga) all appear as tozama outliers, while the most dogged defender of the shogunate, Aizu, appears as a fudai outlier.1
What conclusions can we draw from these visualizations? The first is that the long tails skew not only mean values but our understanding of the distinction between tozama and fudai. The average kusadaka of a tozama domain was roughly 127,000 koku, more than double the size of the average fudai holding of 49,000 koku. But this difference was due to primarily to the presence of extremely large tozama domains, such as Kaga, Satsuma, Kumamoto, etc., with holdings over 500,000 koku. By contrast, there were no fudai with holdings that large. The medians of the two groups are thus quite close (approx 32,000 koku for fudai, and 38,000 for tozama) and the violin plot bulge are around the same points (estimated modes). Thus, large daimyo were overwhelmingly tozama, but those long tails tell us little about the vast majority of domains. Below 500,000 koku the distributions for tozama and fudai look extremely similar, and in both cases, most daimyo held less than 40,000 koku. Thus while we can assume a very large domain was tozama we cannot assume that a tozama domain was large. By extension, very large tozama domains were decisive in overthrowing the shogunate, but that tells us little about tozama in general.
The converse holds for distinquishing features of fudai. Some fudai holdings consisted of many small, scattered parcels, and the most fragmented holdings were all fudai investitutes, but the vast majority of all domains consisted of one contiguous holding.
Such seeming paradoxes are common. US Senators and NASCAR drivers are overwheingly white men, but the vast majority of white men are neither NASCAR drivers nor senators. Simple data visualizations, such as boxplots and violin plots, make it easy to avoid such inferential confusion.
Violin plots use width to mark the number of observations at each level. The thickest part of the plot is the most common value, while thin sections mark less common values. Technically, violin plots combine a box plot with a kernel density plot.
Boxplots display a distribution using a “box” and “whiskers.” The box marks the middle 50% of the data, from the 25th percentile to the 75th percentile, a range called the interquartile range (IQ). The “whiskers” are dashed lines extending out from the box. They extend to the most extreme data point which is no more than 1.5 times the IQ (length of the box) away from the box. Points beyond the whiskers are outliers, and are marked with circles.
Frigge, Michael, David C. Hoaglin, and Boris Iglewicz (1989). “Some Implementations of the Boxplot.” The American Statistician 43 (1): 50-54.
McGill, Robert, John W. Tukey, and Wayne A. Larsen (1978). “Variations of Box Plots.” The American Statistician 32 (1): 12-16.