Co-occurrence in G.R.R. Matrins "A Song of Ice and Fire"
The goal of the following analysis is to generate a set of data describing the co-occurrence of characters and families in the fantasy-saga "A Song of Ice and Fire" (short: ASoIaF) by G.R.R. Martin, i.e. to calculate how often a pair of two characters or one character and a title are mentioned in close succession. The first step is to choose a set of characters of interest. Any character fulfilling at least one of the following two requirements will be considered for our calculations: (1) The character is mentioned at least 200 times. (2) The character is one of the ten most mentioned characters of any of the five published novels in the series.
But how to measure how often a character is mentioned? Lots of characters appearing in ASoIaF share their forename with other characters. An extreme example of this is the name Jon: There are none less than 23 different entries of characters named Jon in the Wiki of Ice and Fire. This makes it hard to recognize an occurrence of a character automatically and has certainly distorted all numbers presented here. The method used here is to count how often the name or any of the aliases of a character is mentioned and subtract the number of times the name or alias is followed by an expression which indicates that another character is meant. For example: The number of ocurrences for the character Bran Stark is the number of instances of the terms "Bran" or "Winged Wolf" followed by a blank or a punctation mark minus the number of instances of the term "Bran the Builder".
This method provides the following data: The length of a bar represents the number of occurrences of the corresponding character; you can divide those bars either by mentions per alias or novel. Very infrequently used aliases are grouped together as "other".
Some characters are featuread only in some of the five books. The following graphs shows how far the occurrences of some characters are spread out between all five books: A score of one means the character is featured in only one of the books, a score of zero means all mentions are evenly distributed over the books. The characters with the most extreme scores are shown below.
Families and Characters
Great Houses and their politics make up most of the plot in the series. Here is a comparison how often the name of a family is mentioned (orange) and how often the names of important characters belonging to a family are mentioned (blue).
After comparing characters to their families comes comparing characters to each other: How often are two characters mentioned in the same context, how often do they co-occur? In the matrix below each row and column belongs to a character; an opaque cell means the two corresponding characters co-occur often. If both characters belong to the same house (or both do not belong to any) their common cell is coloured differently.
You can control not only the order of the characters, but what kind of co-occurrence is used as data: 1-Cooccurrence/3-Cooccurrence counts all the times two character names are mentioned within the same resp. same three lines, weighted 3-Cooccurrence divides this number of overall occurrences of the character to the left. The resulting matrix this is not symmetric, since for example Rickon Stark is mentioned less often then Bran Stark, while they co-occur the same number of times. So the cell corresponding to the column of Bran and the row of Rickon has more colour than its counterpart across the diagonal. Chapter-Cooccurrence counts how often two characters are mentioned in the same chapter and multiplies those numbers.
You can also control how the colouring of the cells works. All-linear simply coloures them proportional to their score, while no diagonal-linear ignores all values of characters cooccuring with themselves and gives a fully opaque colour to the strongest co-occurring pair of different characters. All-sqrt and no diagonal-sqrt coloures all cells not by their number but by its square root, making weaker cooccurences easier to see.
Type of Cooccurence:
Mouse over for Score
The diagram above is based on Mike Bostocks Les Misérables Co-occurrence and built, as all other diagrams, with d3.js.