As KITAB’s research has shown, passim is an incredibly powerful tool for answering a variety of questions about book history and history in general. The algorithm produces a huge amount of data, which we can utilise in our research in many ways. At present, we typically investigate text reuse in the corpus at the level of relationships between pairs of books. That is, we look at statistics that tell us what percentage of one book is reused in another and we visualise alignments between two books. This approach is great for understanding how two books are related, but once we start comparing multiple books these approaches become quite cumbersome.
To note a couple of examples: what if we wanted to investigate how one particular book was disseminated across tens or even hundreds of later works? Or, what if we wanted to know about how one particular unit of information (perhaps a narrative about a major event) circulated throughout our corpus? To help deal with these kinds of questions the team has been developing new applications for understanding multiple-book relationships. In this blog, I will take the opportunity to introduce some of the applications under development. To do so, however, this requires a short tangent on bioinformatics.
The computational analysis of text reuse has developed and advanced alongside that of genomics. This is a fact noted explicitly by David Smith, when he acknowledges that passim utilises the Smith-Waterman alignment algorithm, which is also used in the analysis of genes.[1] Text reuse and genetics are both concerned with problems in sequencing. Put simply, when biologists wish to analyse genes, they convert genetic material into a sequence of letters (A, C, G, T), each of which represent a chemical nucleotide base (adenine, cytosine, guanine, thymine). Three-letter words termed ‘codons’ specify the amino acids that are used by the body to build proteins.[2] There are also ‘stop codons’ that mark the end of the sequence. Similarities in sequences of letters between parts of a chromosome or with other chromosomes, or even similarities between the genomes of different animals, might indicate (for example) similar biological functionality. In short, important biological questions can be resolved by identifying similar sequences of letters and alignment algorithms like Smith-Waterman are used to undertake this task. Read more
Mathew Barber
Research Associate for KITAB at AKU-ISMC. Mathew Barber's research is concerned with the practice of history writing and remembering in the Medieval Islamicate world. He is interested in particular in history writing under the Fatimids in Egypt, the preservation and dispersal of Fatimid-era historical texts, and what this preservation (or lack thereof) means for how later historians viewed and understood the Egyptian past. He is concerned with answering such research questions using a combination of digital and traditional methods, in particular using computational text reuse detection to identify so-called 'lost' texts.
As KITAB’s research has shown, passim is an incredibly powerful tool for answering a variety of questions about book history and history in general. The algorithm produces a huge amount of data, which we can utilise in our research in many ways. At present, we typically investigate text reuse in the corpus at the level of relationships between pairs of books. That is, we look at statistics that tell us what percentage of one book is reused in another and we visualise alignments between two books. This approach is great for understanding how two books are related, but once we start comparing multiple books these approaches become quite cumbersome.
To note a couple of examples: what if we wanted to investigate how one particular book was disseminated across tens or even hundreds of later works? Or, what if we wanted to know about how one particular unit of information (perhaps a narrative about a major event) circulated throughout our corpus? To help deal with these kinds of questions the team has been developing new applications for understanding multiple-book relationships. In this blog, I will take the opportunity to introduce some of the applications under development. To do so, however, this requires a short tangent on bioinformatics.
The computational analysis of text reuse has developed and advanced alongside that of genomics. This is a fact noted explicitly by David Smith, when he acknowledges that passim utilises the Smith-Waterman alignment algorithm, which is also used in the analysis of genes.[1] Text reuse and genetics are both concerned with problems in sequencing. Put simply, when biologists wish to analyse genes, they convert genetic material into a sequence of letters (A, C, G, T), each of which represent a chemical nucleotide base (adenine, cytosine, guanine, thymine). Three-letter words termed ‘codons’ specify the amino acids that are used by the body to build proteins.[2] There are also ‘stop codons’ that mark the end of the sequence. Similarities in sequences of letters between parts of a chromosome or with other chromosomes, or even similarities between the genomes of different animals, might indicate (for example) similar biological functionality. In short, important biological questions can be resolved by identifying similar sequences of letters and alignment algorithms like Smith-Waterman are used to undertake this task. Read more
Mathew Barber
Research Associate for KITAB at AKU-ISMC. Mathew Barber's research is concerned with the practice of history writing and remembering in the Medieval Islamicate world. He is interested in particular in history writing under the Fatimids in Egypt, the preservation and dispersal of Fatimid-era historical texts, and what this preservation (or lack thereof) means for how later historians viewed and understood the Egyptian past. He is concerned with answering such research questions using a combination of digital and traditional methods, in particular using computational text reuse detection to identify so-called 'lost' texts.