Try things out: why a digital humanist should be courageous to try different tools and methods out
Final paper for the course of "Tools and Methods: Critical Encounters". This publication has been produced during my scholarship period at Uppsala University, funded by the Swedish Institute.
“Try things out” and “don’t expect much from the tools” are two powerful messages for digital humanities scholars that were written by Sinclair and Rockwell (2016) in their paper, since the (digital) humanities embodies ambiguity, complexity, fluidity, dynamic change, co-dependence, and other features of humanistic phenomena (Drucker, 2016). As a digital humanist, the two phrases are related in the five activities of digital humanities research that has been elaborated by Drucker (2021), especially in the process of datafication/modeling, processing/analytics, and presentation/display as Drucker (2021) also emphasizes that the digital humanities work falls under intersection of computational methods and humanities materials, moreover it explains well why we need to try various tools (and methods) since digital humanities are expanding in a massive way, sure for now it has a close connection with computational linguistics and information science (Luhmann and Burghardt, 2022), but it is beyond two close fields, the main factor mainly because humanities is a relatively big field and any digital related field is expanding further, especially since the beginning of 2000s following the emerging of the digital world (Berry and Fagerjord, 2017).
In terms of scholarly knowledge production, digital humanities is sometimes being looked at as a money making machine for humanities, since humanities do not produce some findings that immediately can be used by industry (Allington, Brouillette, and Golumbia, 2016). However the main reason why humanities are getting hard to get some fundings because explaining humanities in a direct benefit of economical context is almost impossible, yet in contrary humanities is sometimes perceived as a field for the elitist, someone who has passion for high culture of literary works, philosophical mind, and aesthetic value. Humanities is also being misunderstood by the public, it looks a sophisticated yet complex field and mainly the benefit is for the elitist, encompassing transdisciplinary methods and has some vague area in between crossboundary fields. Those are several reasons why humanities seem to struggle in funding their projects, since it is ‘only beneficial for specific people’, the humanities elitists, and the public fund will be “useless” to fund those humanities related activities. In the end, digital humanities cannot save humanities, neither from institutional extinction nor cultural marginalization, both digital and traditional methods in digital humanities are complementary to each other (Drucker, 2021). However, Liu (2012) also argues beyond acting in an instrumental role, the digital humanities can most profoundly advocate for the humanities by helping to broaden the very idea of instrumentalism, technological, and otherwise. In other words, we cannot separate those two fields as they entangle toward each other.
Creativity and inclusivity: how the digital humanities creatively utilize and inclusively incorporate the tools and methods even from other fields
There are a lot of methods that are not connected with digital humanities and from the methods we can derive many of the tools produced. Sadly, most of the tools and methods in digital humanities are coming from the Global North, they use Western approach and perspective extensively and later replicate them in hope of tackling the problem in another humanities field from the Global South. The result will not be as good as for humanities in Global North, why? Since humanities is a diverse field and has multiperspective characteristics and nuance in nature, it also provides a wide spectrum of knowledge.
Some notorious methods and genres in digital humanities have been captured and categorized by Burdick, Drucker, Lunenfeld, Presner, and Schnapp (2012) and they are worthed to mention here.
Enhanced critical curation (digital collections, multimedia critical editions, object-based argumentation, expanded publication, experiential and spatial, mixed physical and digital);
Augmented editions and fluid textuality (structured mark-up, natural language processing, relational rhetoric, textual analysis, variants and versions, mutability);
Scale: the law of large numbers (quantitative analysis, text-mining, machine reading, digital cultural record, algorithmic analysis);
Distant/close, macro/micro, surface/depth (large-scale patterns, fine-grained analysis, close reading, distant reading, differential geographies);
Cultural analytics, aggregation, and data-mining (parametrics, cultural mash-ups, computational processing, composite analysis, algorithm design;
Visualization and data design (data visualization, mapping, information design. simulation environments, spatial argument, modeling knowledge, visual interpretation);
Locative investigation and thick mapping (spatial humanities, digital cultural mapping, interconnected sites, experiential navigation, geographic information systems (GIS), stacked data);
The animated archive (user communities, permeable walls, active engagement, bottom-up curation, multiplied access, participatory content creation);
Distributed knowledge production and performative access (global networks, ambient data, collaborative authorship, interdisciplinary teams, use as performance, crowd-sourcing;
Humanities gaming (user engagement, rule-based play, rich interaction, virtual learning environments, immersion and simulation, narrative complexity);
Code, software, and platform studies (narrative structures, code as text, computational processes, software in a cultural context, encoding practices);
Database documentaries (variable experience, user-activated, multimedia prose, modular and combinatoric, multilinear);
Repurposable content and remix culture (participatory Web, read/write/rewrite, platform migration, sampling and collage, meta-medium, inter-textuality);
Pervasive infrastructure (extensible frameworks, heterogeneous data streams, polymorphous browsing, cloud computing); and
Ubiquitous scholarship (augmented reality, web of things, pervasive surveillance and tracking, ubiquitous computing, deterritorialization of humanistic practice).
Figure 1. TAPoR website’s screenshot on 31 October 2022
If we are talking about digital humanities tools, there are a lot of them. Bradley (2019) categorizes the digital tools for humanities into three based on its functions and other contextual contexts, they are tools for making, tools for exploring, and tools for thinking. Three of these categorizations are related to each other, the writer thinks that those three can make a circle diagram, and it shows the connectivity among them not in linear but multi-connection.
Let us take a look at TAPoR (Text Analysis Portal for Research) 3.0, an online website toolbox directory for text analysis and text retrieval related tools. The project is currently led by Geoffrey Rockwell and Milena Radzikowska and it is developed by the Arts Resource Centre at the University of Alberta, previously Stéfan Sinclair was a co-lead for this project. TAPoR 3.0 help the researchers to discover text manipulation, analysis, visualization tools, and historical tools; read tool reviews and recommendations; learn about papers, articles and other sources about specific tools; tag, comment, rate, and review collaboratively; and browse lists of related tools in order to discover tools more easily (Sinclair, Rockwell, and Radzikowska, 2022). In its current version 3.0, the TAPoR team redesigned the portal in order to integrate the DiRT (Digital Research Tools) Directory. Although, TAPoR was originally created to be a directory only of tools used for the text analysis (Sinclair, Rockwell, and Radzikowska, 2022), but it has evolved and iterated into the current version, it includes various types of tools for now.
If we take a look at figure 1, on 31 October 2022, there are 1,645 tools that are identified for text analysis and text retrieval methods in TAPoR website, compared to 491 tools on 29 September 2017, which multiplied by three in the last five years. The categories are also growing from 20 to 22 and some of them are being restructured or recategorized as well.
Figure 2. TAPoR website’s screenshot on 29 September 2017 from archive.ph
Those facts are factual proof that digital humanities are growing. As a relatively new field, it evolves, corrects or recorrects in a short span of time. It is alive and Guldi (2019) also argues for a critical and interpretive approach to digital tools based on iteration, for Guldi’s comment, the writer argues the iteration process is one of the important steps to enhance the knowledge production in the field of digital humanities and shows that this field is a creative and inclusive tent in sense of the tools' diversity.
Voyant Tools and Recogito: Inclusivity perspective in digital humanities tools
After attending several workshops for the past four weeks, the writer decides to explore two tools, Voyant Tools and Recogito. The writer feels that these two tools are easiest to use, web-based, and they have versatile functionality. Voyant Tools is a web-based text reading and analysis environment (Sinclair and Rockwell, 2022). It is one of popular tools in digital humanities for text analysis, it is designed to facilitate reading and interpretive practices as well (Sinclair and Rockwell, 2022). We can learn how computer-assisted analysis works and study the texts using this tool. It is also an open-source project and Sinclair and Rockwell (2022) provide the code in Github and they mentioned in the website that this tool is inspired by HyperPo, Taporware and TACT (Text Analysis Computing Tools). Both HyperPo and TAPoRware are the tools with the closest similarity with Voyant Tools, however Sinclair and Rockwell (2022) improve its scalability, ubiquity, and referenceability design principles. Moreover, Voyant Tools also has several primary design principles beside those three, they are modularity, generalization, domain sensitivity, flexibility, internationalization, performance, separation of concerns, extensibility, interoperability, skinnability, and simplicity.
Voyant Tools can help the researchers to analyze the text either for small corpus or big corpus, its scalability enables us to explore the various sizes of inputted text. It is also easy to operate, even for newbies in text analysis, since it has a relatively user-friendly interface and users can play with various tools that have been provided in the website, from corpus, document, visualization, grid, and several other tools.
Based on our latest workshop experience, there are several problems that have been mentioned during the workshop. There is one significant problem, Voyant Tools are good for mostly Western text, since they have high resources data, however for the language with low resource data like from Global South countries, it is not useful. One-size-fits-all approach of the knowledge apparatus will create invisibility to this low resources text, it cannot serve to correct the problem with this approach (Bhattacharyya, 2022).
The writer explored the tools again using Javanese text in Javanese script of Lord’s Prayer’s 2006 translation that has been taken from Javanese Wikisource. Javanese is the largest Austronesian language in terms of native speakers that does not have an official status. In 2012, there were 85 million Javanese native speakers (Kozok, 2012) and they mostly live in Central Java and East Java province, Indonesia. It is now used in Latin script, previously it was written in Javanese script. Javanese script is a writing style of scriptio continua or continuous script, so there is no space between words, however there is some punctuation at the end of sentences. When the writer put the Javanese text into Voyant Tools, the writer encountered several challenges because of scriptio continua and its script.
Figure 3. Lord’s Prayer text in Javanese script in Cirrus visualization, Voyant Tools
From the figure 3, we can see that Voyant Tools cannot tokenize the word, since Javanese is one of the low resources language and it has the scriptio continua. It means that the Voyant Tools does not have the Javanese text corpus in Javanese script, hence the result is not good, and tokenizing in the wrong part of sentence makes no sense.
Figure 4. Lord’s Prayer text in Javanese script in Voyant Tools summary part
Take a look at figure 4, it is a screenshot from the summary part and the frequent words are actually phrases and contain several words in there. It should calculate the document words into 79, not 17 words.
Figure 5. Lord’s Prayer text in Javanese script in Voyant Tools in Contents part
Figure 5 is another proof how messy the data are, long combination of words into the phrases and being tokenized in the wrong part. Consequently, the writer cannot use the tools because of this reason.
Another tool that the writer tried during the workshop is Recogito. Recogito is an online platform for collaborative document annotation, it provides personal workspace and the user can upload, collect, and organize source documents from text, images, and tabular data (Pelagios Network, 2022). The users are able to collaborate in their annotation and interpretation and it helps the user’s works become more visible on the web. It is an open source initiative of the Pelagios Networks and it aims to foster better linkages between online resources documenting the past.
Figure 6. Indonesian article in English Wikipedia uploaded in Recogito
Recogito is like a Google Docs for researchers. It has the ability of open and collaborative works. The writer uploaded the Indonesian article from English Wikipedia in Recogito and explored the tool. There are several useful features from Recogito, first we can see the relation between place, name using the “relations” feature, we just click the annotated word and link it to another word. Second, we can also use the “map” feature to explore the annotated words, for example from the screenshot below, the Indonesian border is turned into green and we can see the related sentences that include “Indonesia” as the highlighted annotated word.
Figure 7. When annotating “Indonesia” word, it will show the Republic of Indonesia in the Recogito’s map feature.
We can also check the annotation statistics, edit counts and even the edit timeline and download the annotation, place, relations, and annotated document into several versions of open source format, from CSV to JSON.
Figure 8. Annotation statistics.
However, we cannot get the maximal result for annotated places using Recogito. Several places in Indonesia are not detected. Another problem is the Recogito database does not recognize the disambiguous item in preferred order. It means Palau as a country will go in the third or fourth place compared to the Palau comune in Sardinia, Italy. The bias is because Recogito at the beginning is being used extensively for Global North text, especially Latin and Greek text. Maybe we should use a different approach and start de-colonize maps soon like Presner and Shepard (2016) said and using crowdsourcing as one of effort to make map data more inclusive.
Final remarks
Tools and methods in digital humanities are iterated and improved continuously, even the researchers like Terras and Nyhan (2016) are doing the iteration process by digging back from the past history of Father Busa’s female punch card operator and bring them to the upfront and show the world how the women’s contributions in the past are being neglected. This story around digital humanities seems promising. However, since digital humanities are previously focused on Global North context, it makes Global South, in many aspects, seem more invisible. There is some hope that digital humanities is rising as a new and interesting field in recent years. Encouraging researchers from Global South to join the digital humanities is one option to tackle the inclusivity issue. Diversity, open data, and collaboration is the key of digital humanities. By opening the data including tools and research works, it can enable the knowledge sharing among the digital humanities scholar, public, and open data enthusiast. Maybe in the future, digital humanities tent will grow bigger than before, with multiplying our effort and inviting more people as digital humanities projects usually entail an enormous amount of work (Ramsay, 2012). Similarly, Foka, Cocq, Buckland, and Gelfgren (2020) said that digitization can contribute to the development of knowledge, but it should not be an end in itself, and neither should data visualization, so there is no end growing for digital humanities; no ending mean that the digital humanities projects will expand drastically. In the end digital humanities projects will be more diverse and accommodate various modified tools and methods that are suitable for every digital humanities or traditional humanities research needs. And, as digital humanities scholars, we do not need to be afraid to try various tools out and do not expect from the current available tools, maybe those can help us to explore or even build our own specific tools for our project.
References
Allington, Daniel, Sarah Brouillette, and David Golumbia (2016). Neoliberal Tools (and Archives): A Political History of Digital Humanities. Available: https://lareviewofbooks.org/article/neoliberal-tools-archives-political-history-digital-humanities [Retrieved 2022-10-01]
Berry, David M. and Anders Fagerjord (2017). Digital Humanities: Knowledge and Critique in a Digital Age. Cambridge: Polity Press, p. 1-102, 103-135, 136-150.
Bhattacharyya, Sayan (2022). Epistemically Produced Invisibility, in Global Debates in the Digital Humanities, Fiormonte, Domenico, Paola Ricaurte, and Sukanta Chaudhuri. Minneapolis: University of Minnesota Press, p. 3-14.
Bradley, John (2019). Digital Tools in the Humanities: Some Fundamental Provocations? Digital Scholarship in the Humanities, 34(1), p. 13-20.
Burdick, Anne, Johanna Drucker, Peter Lunenfeld, Todd Presner, and Jeffrey Schnapp (2012). Digital_Humanities. Cambridge: MIT Press.
Drucker, Johanna (2021). The Digital Humanities Coursebook: An Introduction to Digital Methods for Research and Scholarship. United Kingdom: Taylor & Francis Group.
Drucker, Johanna (2016). ”Graphical Approaches to the Digital Humanities”. In A new companion to digital humanities, Unsworth, John, Ray Siemens, and Susan Schreibman (ed). Chichester, UK: John Wiley & Sons, p. 238-250.
Foka, Anna, Coppélie Cocq, Phillip I. Buckland and Stefan, Gelfgren (2020). Mapping Socio-Ecological Landscapes. Geovisualization as Method, in Routledge International Handbook of Research Methods in Digital Humanities, Schuster, Kristen, and Stuart Dunn (ed.). London: Routledge, ch. 13.
Guldi, Jo (2018). Critical Search: A Procedure for Guided Reading in Large-Scale Textual Corpora. In Journal of Cultural Analytics (20 December 2020).
Javanese Wikisource (2022). Javanese text version of Lord’s Prayer in Voyant Tools by Sinclair, Stéfan and Geoffrey Rockwell (2016). Text was taken by the writer from Javanese Wikisource, 2006’s translation version. Available: https://voyant-tools.org/?corpus=615c14ac8e761f61f1137539d92f1467 [Retrieved 2022-11-01]
Kozok, Uli (2012). How many people speak Indonesian?. Available: https://ipll.manoa.hawaii.edu/indonesian/2012/03/10/how-many-people-speak-indonesian/ [Retrieved 2022-11-01]
Liu, Alan (2012). Where Is Cultural Criticism in the Digital Humanities? In Debates in the Digital Humanities, Gold, Matthew K. (ed.). Minneapolis: University of Minnesota Press.
Luhmann, Jan and Manuel Burghardt (2021). Digital humanities—A discipline in its own right? An analysis of the role and position of digital humanities in the academic landscape. Journal of the Association for Information Science and Technology, 73(2), p. 148– 171.
Pelagios Network (2022). Recogito. Available: https://recogito.pelagios.org/help/about [Retrieved 2022-11-01]
Presner, Todd, and David Shepard (2016). Mapping the Geospatial Turn, in A new companion to digital humanities, Susan Schreibman, Ray Siemens, and John Unsworth (ed.). Chichester: John Wiley & Sons, p. 247-259.
Ramsay, S. (2012). Developing Things. Notes toward an Epistemology of Building in the Digital HumanitiesIn Debates in the Digital Humanities, Gold, Matthew K. (ed.). Minneapolis: University of Minnesota Press.
Sinclair, Stéfan and Geoffrey Rockwell (2016). Text Analysis and Visualization, in A new companion to digital humanities, Susan Schreibman, Ray Siemens, and John Unsworth (ed.). Chichester: John Wiley & Sons, p. 274-290.
Sinclair, Stéfan and Geoffrey Rockwell (2022). Voyant Tools. Available: http://voyant-tools.org/ [Retrieved 2022-11-01]
Sinclair, Stéfan, Geoffrey Rockwell and Milena Radzikowska (2022). TAPoR (Text Analysis Portal for Research) 3.0. Available: https://tapor.ca/pages/about_tapor [Retrieved 2022-10-31]
Terras, Mellisa and Jullianne Nyhan (2016). Father Busa’s Female Punch Card Operatives, in Debates in the Digital Humanities 2016, Matthew K. Gold and Lauren F. Klein (ed.). Minneapolis: University of Minnesota Press.