For the past few weeks, I've been dictating emails, blog posts, and texts whenever it makes sense. This is part of my experiment with AI, to see whether an AI transcription model understands me better than traditional speech-to-text software, and using different tools to make my dictation more concise.
Dictation is an unusual way of writing for me. As a typist, I'm used to thinking while typing and thinking at the speed of my fingers. Sometimes I feel like I need to write to clarify what I think. But when I’m dictating, I end up taking long pauses as I try to figure out what to say next – it feels like using an entirely different part of my brain than when I’m typing.
The whole experience makes me think of a scene from a TV show — I think it was Mad Men — where an executive dictates a letter to a secretary. They speak with perfect grammar and without pausing, as if they know exactly what to say and can do it in real-time. Of course, it’s TV so it’s fake! But dictating is a skill that seems completely different from the one I've developed over the past 30 years.
If we shift from a world where most people interact with technology by typing to one where dictation becomes the norm, how will it impact our thinking, writing, and expectations? If most people are writing emails by dictating, what will happen to the way we speak? How will that change the tone and character of our emails? What's going to adapt more: our speech or written communication? Will we learn to dictate in a way that mirrors how we write today, or will email writing evolve to be more like spoken language?
My guess is that we are going to start to expect that our dictation software interprets for us as well as listens to us. Rather than writing down exactly what we say, we're going to expect that our software writes down what we mean.
Maybe that's similar to how dictation used to work when people would dictate letters to a secretary. So us modern professionals will need to re-learn a lost set of skills as we start talking to our technology more and more.
I wonder how it would go if you built a little interpolation agent to take in what you say raw, reformat it with good grammar, and then print that reformed dictation to whatever app you were using.