Most researchers who conduct interviews will have a tale or two related to transcription; the process whereby you turn audio recordings into typed text. There’s no doubting how labour intensive it is. Depending on your typing speed, the quality of your audio and the equipment you’re using, it’s often likely to be around a 6 or 7 to 1 ratio – one hour’s interview will take six or more hours to transcribe. For me, that slow processing of the words of my participants is the first opportunity to begin to analyse what they’ve said within the broader context of the whole study. Since your pace is slow, you’re able to begin becoming more intimate with that data. On the other hand however, it can be backbreaking work … quite literally. Hours spent hunched over a keyboard can place demands on your physical well being in so many different ways, even if you do know all the health and safety advice. So any tools which might ease some of that burden (short of farming out the transcription, as some researchers do) are definitely of interest to me.
One of my first attempts was to use Speech Recognition, a Google doc ‘add-on’ where I would re-narrate the audio from the interviews directly into a Google doc. The accuracy is impressively good I find, with few corrections required. The part which slows you down is listening to the audio, pausing after you’ve heard enough, (but not so much you can’t remember it. So surprisingly little in my case!), speaking it back into Speech Recognition, then restarting the audio for the next bit. It always seems necessary to do a little ‘rewind’ of the recording, which even though it’s digital, can still be a real pain. The upshot is that I found it took me about the same amount of time as conventional transcribing. Now if I could listen and speak back in real time, that would make all the difference, but the mental processing to do that is beyond me at the moment; I guess it could be learnt though (if I had the time)?
There are plenty of transcription tools where you upload an audio file and then have a textual interface into which you can type. They provide a few additional features which help with the transcription, especially if you have a foot pedal; similar to the way a stenographer might work. I didn’t see much advantage for an index-finger typist like me. I did try one piece of freeware which had the standard features, but it barely improved my speed.
What I really wanted was a way where the spoken audio could be added directly to a document using a speech to text converter, without me having to perform the intermediate narration. Then at the weekend I came across Speech Logger, an online application which does precisely that.
Having recently installed and (partially) mastered Voicemeter for recording my Skype/Hangout interviews, it looked like it would also provide the means to squirt audio directly into Speech Logger. Woohoo! Err, no! I have confess that the results weren’t exactly spectacular; it wasn’t just that the text would need a few words amending and some punctuation adding, it needed almost a complete rewrite. However, I can’t lay the blame entirely at the feet of Speech Logger; it was a tough ask, given the quality of the audio and the speed at which some of the dialogue took place. It could neither keep up, nor cope with the low quality audio. I wondered whether the Speech Recognition add-on for Google Docs might fare better. Well, marginally it transpired; it kept up slightly better, but still struggled with accuracy. The problem as I see it is that people in conversation aren’t speaking with a speech-to-text engine in mind; they’re speaking for another person. As a consequence they include pauses, inflections, colloquialisms, discourse markers and word fillers, which really give the software a hard time. Speech Recognition was good enough to pick up the ‘um’s once I’d slowed down the playback speed to 80% using Audacity, but still struggled with accuracy:
… finish my degree and then thought I should you want to go into teaching and then started to see lots of teachers coming on to Twitter and um and then started to change my public projects to start the top 10 all the celebrities and started to follow teaching dad um and then that’s why…
From previous experience of speaking written text into a Google doc quite successfully (even with a Yorkshire accent!), it seems that there’s a grammar or context checker of some kind automatically correcting some of the words the software failed to pick up correctly. An impressive feat, but unfortunately, conversational dialogue proved to be a bridge too far. As you can see in the example, you can just about get a sense of what was being said, but it would take just too much replaying the audio and unpicking the text. For the moment, I’ll stick with the headphones and stiff neck. Unless you know better …