Blog - Diversifying the OpenITI Corpus, One Text at a Time

By Dr Gowaart Van Den Bossche

Jan 22, 2021

Dr Gowaart Van Den Bossche, AKU-ISMC's Post-Doctoral Research Fellow:

The vast majority of texts in the OpenITI corpus were sourced from three major collections of digital texts originally prepared by organisations based in the Middle East (see Peter Verkinderen’s excellent blog on the largest of them, Shamela). These collections have proven invaluable to researchers, not in the least because they have facilitated digital forms of analysis, but they all made choices about which textual materials should belong on their platforms and which should not. Shamela for example did not originally include any poetry and focused largely on Sunni traditionalism. Another significant omission in each of these collections is the rich corpus of Arabic texts written by and for non-Muslims.

While it would be naive to think that OpenITI is itself unbiased, we do aim to be as inclusive as possible and try to address several of the issues found in other collections. First, we bring the separate collections together and engage all of them in a continuous dialogue on text reuse. Second, we identify and try to fill gaps in the corpus through digitising texts ourselves or through collaborating with other research teams. This has led to the inclusion of texts digitised by other academic projects in fields that are of only marginal interest to the major text collections. Due to the concerted efforts of Peter Verkinderen and Lorenz Nigst, we now have a number of agrarian treatises, Greco-Arabic philosophical materials, texts of ancient wisdom, as well as some materials preserved in the so-called Cairo Geniza. We list all these various collections and individual contributors with their URI identifiers here.

In the meantime, I have also contributed a number of texts I am interested in by using Optical Character Recognition software, typing up manuscript treatises, and by contacting people who published texts online. To encourage people to also contribute such materials they may have to our corpus, I will here briefly describe a small set of texts I have contributed to the OpenITI in this way. If readers of this blog feel inclined to contribute textual materials in exchange for text reuse data, they should get in contact with KITAB team members and we will happily accept their offers.

Continue Reading

Dr Gowaart Van Den Bossche, AKU-ISMC's Post-Doctoral Research Fellow:

Continue Reading

Share This Page

Title

News_Detail

March 1983-2023

Blog - Diversifying the OpenITI Corpus, One Text at a Time

By Dr Gowaart Van Den Bossche

Continue Reading

Continue Reading

Share This Page

Related News

Blog - Diversifying the OpenITI Corpus, One Text at a Time

By Dr Gowaart Van Den Bossche

Continue Reading​

Continue Reading​

Share This Page

Related News

Continue Reading

Continue Reading