The findings are interesting, but the study design is lacking. A single device is used (to be fair, it’s a commonly used device) and as far as I can tell a single person recorded the keystrokes and was assessed. I don’t think it did a good job of simulating trying to train and create a model for someone via recorded audio from a medium such as zoom given many realistic variables like audio quality, being on or off mute, connection quality issues, mic sensitivity, etc. With that being said, it is exposing a theoretical attack vector and I think that’s important to identify and recognize.
The article focuses on threat actors, but I think that the more common use might be a coworker or boss deciphering what people are typing/shitposting outside of the official meeting. Always mute when you’re not talking, folks.
That’s just generally polite anyway, I don’t wanna hear people typing away during a meeting
I’d assume this only works with non-normalized stereo audio. Just flip mono audio on and normalize, then you can’t really tell which key is pressed, or if you’re talking at the PC or from the living room.
There was previous (german?) research that was able to do this from just well-recorded sound.
HRTF etc. wasn’t required.https://www.newscientist.com/article/dn7996-keyboard-sounds-reveal-their-words/ (Paywall, apologies, and it’s US, I couldn’t find the german one)