Digital Humanities: a collaborative field that enhances knowledge production

Final paper for the course of "Tools and Methods: Critical Encounters". This publication has been produced during my scholarship period at Uppsala University, funded by the Swedish Institute.

In 2006, Clive Humby, a British mathematician coined a very popular phrase “data is the new oil”. For the last decade even until now, this phrase is always being broadcasted as a magnet for data enthusiasts and the general public, especially in relation to digital topics. Data plays a major part in the digital humanities field as well. Without data, it is impossible to start digital humanities research and projects. This essay discusses the writer’s perspective of digital humanities in general and compares to other mainstream and peculiar arguments from the humanities scholars in general and digital humanities scholars specifically. The essay wants to explore this collaborative field that can enhance the knowledge production in humanities in general, and how it can add an additional value on the knowledge production from either old data that has been collected or studied before by humanities scholars or completely new data. In the last part, the writer composes a conclusion based on his experience and the knowledge that has been gained during the course, the writer argues that digital humanities can be seen anywhere and anytime, it is such a ubiquitous field, it can be found in the formal education (university), a semiformal event (hackathon), and even in an informal collaboration (crowdsourcing projects).   


Digital humanities: a new field or a branch of humanities? 

Digital humanities is a relatively new field. New in the sense of comparison to the other established fields that have a longer history, it just emerged at the beginning of the 2000s following the rise of the digital world (Berry and Fagerjord, 2017). Previously it had been known as humanities computing, humanities sees the computer as a tool to support the knowledge production in humanities, later as Hayles (2012) comments that the field had emerged from the low-prestige status of a support service into a genuinely intellectual endeavor with its own professional practices (Berry, 2012, p.43). By using the term ‘digital’, it brings this new field an active soul, compared to computing that feels frigid and passive. In the recent development, Luhmann and Burghardt (2022) also give us the conclusion in their writing that digital humanities is a discipline that has a close connection with computational linguistics and  information science.  

Back in 2012, there was a well noted argument from Svensson (2012) that digital humanities  is a humanities project and the term ‘digital’ is the boundary. He drew the conclusion based on four arguments:  funding agency, institutional level, junior scholar, and  a panel, “The History and Future of the DH” at the Modern Language Association convention on 7 January 2011. Donoghue (2008) and Nussbaum (2010)’s argument  in Svensson (2012) that humanities is in a precarious situation in terms of funding and recognition, it is likely to refer to sentiment towards digital humanities. As the digital humanities  is the “mastermind” why humanities fields lose their funding, since digital humanities has marketability in the workplace (Risam, 2019) or in the other words, digital humanities can easier to bring the donor closer, since digital humanities using the more recent tools and it seems that digital humanities is “more practical” compared to the humanities.  

Humanities in Gibbons et al., (1994) is being created using Mode 2 knowledge production in the sense of it is a broader, transdisciplinary social, and economic context. On the other hand,  Mode 1 which is a disciplinary, primarily cognitive, context, is related to natural science.  The nature of digital humanities is the same with humanities, the tendency of Mode 2 appears when in practical, digital humanities research favors collaboration among the scholars. The way how digital humanities entangles humanities can be easily recognized by its nature. Digital humanities favor transdisciplinary approaches, it combines various skills and methods from other fields, mainly information technology and humanities, to build a good base for research and projects. Collaboration is another key word when we are talking about digital humanities. In the digital humanities research or project, the digital humanist is “an academic matchmaker”. They communicate the need of humanities scholars to the information science scholars, and it happens vice-versa. A digital humanist is also a generalist -if we want to use specific terms that are used a lot in the workplace- not a specialist. Prior to joining the digital humanities, they may be a specialist, but after they will gradually become a generalist. As digital humanities, we inherit the knowledge that has been passed by several established fields. For one who comes from a humanities background we should learn how to code, the information technology part; those from an informational science background should learn the humanities theory and so on. This is the reason why the digital humanist is definitely a generalist and academic matchmaker among the other fields, they help to connect the dots for them. 

The object knowledge production in general as Cetina (1999) is malleable, we can malleableize the object for study and research purposes. Humanities offers experience of living and working with complexity and supercomplexity, it may complement and correct the science and social science (Parker, 2008). Based on Cetina’s and Parker’s, we can conclude that humanities is far more complex than general natural science. It also affects the digital humanities, until now we can see that digital humanities is bigger than what we can expect, it is not only humanities field, social science can be included too. It may fall between computational linguistics and information science for now, but it can go beyond that, depending on how the digital evolve in the future. As Latour, WooIgar and Salk (1986) narrates the “laboratory life” story, it also give us new insight that knowledge production is a continuous process, while new facts are emerging, it will affect the previously established knowledge, although during the process it also need some negotiation and need the credibility, so the new fact can be claimed as the valid one.   


Data and Digital Humanities  

(Digital) data is the core of digital humanities research. Drucker (2021) gives a good framework on how to start humanities research, it has three components:  material, processing, and presentation; and five fundamental activities: mediation/remediation, datafication/modeling, processing/analytics, presentation/display, sustainability/preservation.   

Cultural institutions or usually known as archive, library, museum (ALM) play a big role in the digital humanities field. They are the cultural gatekeeper, where it has the materials. They collect human knowledge in mostly physical collections, from books, magazines, paintings, to DVD. Following the introduction of the digital world, it shapes their way to be more digitally available. The term digitization is a prominent word for cultural institutions. By digitizing their collection, cultural institutions actually give their collection a digital soul. Researchers can access their collection without ruining their collection, especially those that are prone to damage, such as old manuscripts. It is the start place where the material is in mediation/remediation activities, the digital humanities will help to make the analogue data to become the digital data, later the data will through datafication/modeling process. However, sometimes generating good data is a long process, even broken data can generate meaning (Pink, 2018). Then, the digital data will be added with a good metadata, so the computational tools and methods can easily process or analyze  the data and finally the data can be visualized. After the project ends, the most important part is how sustainable the project is. As usual after the project ends, no one will manage the project, it is what happened in Trading Consequences.   

We can learn all about the digital humanities framework using Trading Consequences as the example. In Trading Consequences (Hinrichs et al., 2015), we learn that digitization has a tremendous impact on making research easier. It is like a first step, mediation/remediation, that we need to do before moving forward.  Hinrichs et al. (2015) computationally analyzed the digitized documents relevant to the trade in the 19th century, the digital documents are mainly from The British House of Commons Parliamentary Paper (ProQuest) and Early Canadiana Online (canadiana.org). After the classifying the data and under the datafication/modeling process, the data needs to have a OCR process to make them readable and can be recognized easily by  the computational tools and then finally it is ready for the text mining process. From the humanities research context, or historical research in the paper, Hinrichs et al. (2015) mention that their effort for combining the text mining method and information visualization bring the new approaches to the fields.  Although it needs an immense workload from environmental historians, computational linguistics, and visualization experts, the final result is promising. It is a starting point to delve into the research, since researchers can easily find their preferred or specific information using the visualization as their aid. What the most interesting about the Trading Consequence project is they conducted workshop to gain feedback from the fellow historian researches, it it important part of digital humanities field where the collaboration is “a never ending process”, it values an iterative process, enable their product to be more inclusive, not only for the sake of their own, but considering the valuable feedback from potential users.  However, as it has been mentioned earlier, sustainability/preservation is not in the Trading Consequence’s post-project agenda, we can not access the project through their website now, although it is still available in Github.  


Digital humanities and the commons problem 

Digital humanities apparently face several challenges from data preparation to final presentation. When starting digitization sometimes we encounter quality issues, whether the digitized materials are not in good condition or even struggling in digitization effort, especially lack in funding. Lack of funding is a common issue in digital humanities, especially in the mediation/remediation activities, a lot of institutions are struggling to ensure the grantor or fund organization that they can give metrics or result post-research or projects. This is the reason why digital humanities are expanding in Western world or Global North, compared to Global South, because Global Norths are one or more steps ahead of the researchers in Global South. The (digital) data mediation/remediation has been started early in various formats. 

 

After dealing with some digitization technicalities, we will encounter data modeling. How to make our data more machine readable to ease the data processing with our computational tools. Not every tools can be used to automatically to refine the data can be more machine readable, some researcher are struggling since her language script does not include in ISO 15924, a standard of codes for the representation of names of scripts, so they cannot do optical character recognition (OCR) and another are struggling to define the part of speech of their language that is not suitable using the Indo-European languages structure that has been used for reference in general linguistics for years.

The classification of data will be another challenge, since we need the standard to help the classification, but on the other hand Bowker and Star (1999) raised some concern that each standard and each category valorizes some point of view and silences another. Data in digital humanities must cover multiperspective that value the humanities field as well, so in the final visualization product we can show the correlation between the data that has been structured in a way to make it easier to be read and gives a value to them.   


Coding da Vinci: a hacking the cultural data event 

Europana (2015) claimed that 90% of  European heritage has not yet been digitized. It marks another question, why? What hinders the effort? Data Europa (2018) gave us a good conclusion why opening the cultural data is important, it can give access to a broader audience, damage prevention to original copy, reunification of collections, and for research and education. The last one is the benefit for events like hackathons. Without open data, they can not collaborate to build those new creative ideas.  

In the past decade, hackathon has been a rising popular term, although some people think that hackathon is mostly for technical works - information technology related things - it is not completely wrong.  Hackathon has been dubbed as “a place to develop creative ideas” (NDPC, 2019), an event where cultural and technology intertwine. This is what happened in Coding da Vinci. Coding da Vinci brought together the cultural sector with creative technology communities to explore the creative potential of digital cultural heritage (Coding da Vinci, 2022c). In other words, Coding da Vinci is a matchmaker between the cultural institution and hackathon enthusiast who are coming from different background, from information technologist, user interface (UI)/ user experience (UX) researcher, cultural enthusiast, to general people, they are working together to bring new knowledge production using the open cultural data that has been donated by the cultural institution.

Coding da Vinci is in the middle of the digital humanities spectrum between academic institutions and crowdsourcing projects. It is the real trading zone (Svensson, 2012), a meeting place between various people that come together to build something new from the donated data. The cultural institutions as the “data owner” share some of their data to the public, so the public can play with the data, what we do not know is how long the process to bring them into the hackathon event, such as Coding da Vinci. However, making data available online needs a long time effort, the institutions should understand why they need to open their data collection, what kind of format is preferable, and what kind of benefit, either direct or indirect for them.  

One project example from the Coding da Vinci hackathon event is Linked Staged Graph. The project was awarded for the most useful in Coding da Vinci Süd 2019. The team made a knowledge graph using the open data provided by the National Archive of Baden-Württemberg.  (Chaudhri et al., 2022) define that knowledge graph is a directed labeled graph in which we have associated domain specific meanings with nodes and edges. So, the Linked Staged Graph team uses 7,000 images and datasets of on-stage theater events (Coding da Vinci, 2019) to know the relation (edges) between the entities (nodes). The project also colorizes the images using artificial intelligence-based image coloring, so the final images radiate its liveliness vividly. They bring the visualization between the dataset and pictures, also connecting the node to bring the visual option of capturing knowledge. From this one particular project, we can learn that opening the cultural data can give the data a creative soul through hackathon events like Coding da Vinci, although they must  plan the event in advance, the event was from January to May 2019, and need a lot of coordination, but the final result is amazing.  

Final note: digital humanities in real life 

After joining the five weeks course, I finally can conclude that knowledge production in digital humanities can be separated into three basic models. First, in the academic institution, the aim is to show that digital humanities can have a scholarly position in the old academic world. Second, it is a hackathon or a similar semi-formal event, it is in a limited tent with people from various backgrounds, but sometimes they can still relate with the academic world or even general people, who are collaborating together to build something from the open access data that have been provided by the cultural institutions. Third, it is an online crowdsourcing project. It liberates the idea that knowledge production should come from prestigious institutions or academic figures. The knowledge exchanges are similar to the Mode 2 process, but in a more “radical” manner, they will not ask about your academic background since they believe that anyone is smart in their own way, even if you are not getting a formal education, just like what Wikipedia does and at the same time, it also avoids power structures and of being democratically developed (Jemielniak, 2014)

 

References

Berry, David M. and Anders Fagerjord (2017). Digital Humanities: Knowledge and Critique in a Digital Age. Cambridge: Polity Press, p. 1-102, 136-150

Bowker, Geoffrey C. and Susan Leigh Star(1999). Sorting Things Out: Classification and its Consequences. Cambridge: MIT Press, p. 1–32.

Chaudhri, Vinay K., Chaitanya Baru, Naren Chittar, Xin Luna Dong, Michael Genesereth, James Hendler, Aditya Kalyanpur, Douglas B. Lenat, Juan Sequeda, Denny Vrandečić, and Kuansan Wang (2022). Knowledge graphs: Introduction, history, and perspectives. AI Magazine, 43(1), p.17–29. 

Coding da Vinci (2019). Linked Stage Graph | Coding da Vinci. Available: https://codingdavinci.de/de/projects/2019_sued/linked_stage_graph.html  [Retrieved 2022-09-26].

Coding da Vinci (2022). Coding da Vinci. Coding da Vinci. Available: https://codingdavinci.de/en [Retrieved 2022-09-26].

Daley, Beth (ed.) (2015). Transforming the world with culture: Next steps on increasing the use of digital cultural heritage in research, education, tourism and the creative industries. The Hague: Europeana Foundation.

Data Europe (EU) (2018). Cultural institutions and cultural Open Data | data.europa.eu. Available: https://data.europa.eu/en/datastories/cultural-institutions-and-cultural-open-data [Retrieved 2022-09-26].

Drucker, ​​Johanna (2021). The Digital Humanities Coursebook: An Introduction to Digital Methods for Research and Scholarship. United Kingdom: Taylor & Francis Group. 

Jemielniak, Dariusz (2014). Common Knowledge? An Ethnography of Wikipedia. Stanford: Stanford University Press, p. 1–9, 29–58

Gibbons, Michael, Camille Limoges, Helga Nowotny, Simon Schwartzman, Peter Scott, and Martin Trow (1994). The New Production of Knowledge: The Dynamics of Science and Research in Contemporary Societies. London: Sage, p. 10–22, 79–94.

Hayles, N. Katherine (2012). How We Think: Transforming Power and Digital Technologies. In: Berry, David M. (ed.), 2012. Understanding Digital Humanities. London: Palgrave Macmillan, p. 42–66. 

Hinrichs, Uta, Beatrice Alex, Jim Clifford, Andrew Watson, Aaron Quigley, Ewan Klein, and Colin M. Coates (2015). Trading Consequences: A Case Study of Combining Text Mining and Visualization to Facilitate Document Exploration. Digital Scholarship in the Humanities, 30(1), December 2015, p. i50–i75. 

Knorr-Cetina, Karin (1999). Epistemic Cultures: How the Sciences Make Knowledge. Cambridge: Harvard University Press, p. 1–45. 

Latour, Bruno and Steve Woolgar (1986). Laboratory Life: The Construction of Scientific Facts. Princeton: Princeton University Press, p. 43–90. 47.

Luhmann, Jan and Manuel Burghardt (2021). Digital humanities—A discipline in its own right? An analysis of the role and position of digital humanities in the academic landscape. Journal of the Association for Information Science and Technology, 73(2), p. 148– 171.

Northern Dimension Partnership on Culture, 2019. Shaping the Future hackathon on creative and cultural industries offers innovative and sustainable solutions. Available: https://ndpculture.org/news/shaping-the-future-hackathon-on-creative-and-cultural-industries-offers-innovative-and-sustainable-solutions/  [Retrieved 2022-09-27].

Parker, Jan (2008). ‘What have the Humanities to Offer 21st-Century Europe?’: Reflections of a Note Taker. Arts and Humanities in Higher Education, 7(1), p. 83–96. 

Pink, Sarah, Minna Ruckenstein, Robert Willim, and Melisa Duque (2018). Broken Data: Conceptualising Data in an Emerging World. Big Data & Society, 5(1), p. 1-13.

Risam, Roopika (2019). New digital worlds: postcolonial digital humanities in theory, praxis, and pedagogy. Evanston, Illinois: Northwestern University Press, p. 3-88, 115-144.

Svensson, Patrik (2012). The Digital Humanities as a Humanities Project. Arts and Humanities in Higher Education, 11(1–2), p. 42–60.