Skip to main content
Jefferson transcript atap
05 April, 2023

Helping researchers search and manipulate transcripts

While participating in an industry placement scheme, PhD candidate Evelyn Ansell created a browser-based tool to assist Conversation Analytic researchers

University of Queensland (UQ) researcher, Evelyn Ansell, participated in a Higher Degree by Research (HDR) Candidate Placement with AARNet, where she developed a working prototype Jefferson Transcript Search Tool for Jupyter Notebooks. This project uses programming to explore how to easily search and manipulate transcripts without the need to clean them first.

HDR placements provide PhD and MPhil candidates with challenges that exist in an industry environment, helping them develop planning, strategy, communication, leadership and networking skills. AARNet’s goal for this placement was for the candidate to create a digital tool to benefit Conversation Analytic (CA) researchers in the future.

While working through her PhD, Evelyn found there was a lack of digital tools for CA researchers to search and track phenomena within their transcripts produced in Word and other word-processing software environments. Working with AARNet and the Australian Text Analytics Platform (ATAP) team, she developed an early version browser-based tool designed to be used by researchers unfamiliar with programming.

Where previously researchers would need to ‘clean’ a document to do a search by removing formatting, notation and special characters, the new tool will do this process automatically in the background, and display live search results as the user types. One key feature of the tool is that due to the way it removes notation and returns searches is that it will show all instances of a word or phrase even if the original transcript has special characters, while traditional search methods would only return exact matches as they appear unless the notation is manually removed.


Jefferson transcription example, Prof. Michael Haugh, Griffith Corpus of Spoken Australian English 2009 (GCAusE01). Glossary; [] – overlapping speech; (.) – a brief interval; underline – emphasis; : (colon) – stretched sound; (1.3) – a pause of 1.3 seconds; ↑ – rise in intonation, ↓ – fall in intonation; >< – quickened speech rate, <> – slowed speech rate; = – no break; .hhh – in breath, .h – short in breath; °° – softer than surrounding talk; ( ) – speech was not transcribed.

To assist with developing and showcasing the tool, AARNet supported Evelyn with technical and instructor training with The Carpentries, which is made up of communities of Instructors, Trainers, Maintainers, Helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.

Evelyn recently ran a workshop with ATAP at UQ for Ethnomethodology and CA researchers, discussing how the team developed the tool, demonstrated how it works, and highlighted the possible applications that could assist them with their research. The workshop provided invaluable informal peer review, with input from practitioners on their current practices and future directions tools like this can take to better help researchers.

The Jefferson Transcript Search Tool is continuing to be developed, with future plans for the tool including expanding the functionality to search other transcript types, and allowing exporting of the search results to a file.

The Australian Text Analytics Platform (ATAP) project received investment (https://doi.org/10.47486/PL074) from the Australian Research Data Commons (ARDC). It is led by the University of Queensland, with support from AARNet and the University of Sydney.

To find out more about ATAP, visit the ATAP website or contact the AARNet eResearch team.