2X Comprehension

2xComprehension is an ongoing project in developing a better interface for communication with dementia patients and people with cognitive disabilities by employing natural language processing (NLP) into hearing aids.

This idea was born out of watching my mother scream angrily at my grandmother; despite sporting hearing aids that had effective gain and an EQ curve fit well for her ear, my grandmother would still struggle with understanding causing my mother greater and greater frustration in her attempts to take care of her. Understanding this, the goal of this project is to determine if tactics that have been proven to improve communication in people with dementia/alzheimers can be consolidated into a hearing device.

The prototype right now consists of a microphone attached to a laptop running a python NLP script. Incoming audio enters a buffer which is then manipulated by the script and sent back to the user. The schematic is:

audio/speech → microphone →python script with NLP → user’s ear piece

Please watch this video that details just the python natural language processing (NLP) script:

The goal is to not only gain audio in volume, but to increase understanding for the user. The NLP script employs several techniques useful in communicating with dementia patients. Currently, the prototype improves comprehension by:

1) addressing the person by their first name With regards to the first item above, it has been found helpful to address alzheimers patients by their first name when addressing them directly (cite). On the first instance of audio, the prototype inserts the patients first name before the rest of the phrase. For example, the script takes the phrase "did you enjoy the movie?" and turns it into, "Shirley, did you enjoy the movie?" as displayed in the image below:

2) slowing down fast phrases With regards to the 2nd line item, I first attempted to take a buffer of speech, slice it between the words, and play them back with some delay between them, but it turns out it is very difficult to cut up speech with respect to the words themselves. Humans don’t speak sentences enunciating succinct words with space inbetween them. It turns out that, when speaking normally, the ending of one word blends in with the beginning of the next. Take a look at this sentence of “did you enjoy the movie?” in audacity:

It’s apparent that it would be very difficult to analyze the recording and truncate it between the five words, so instead, I used the wav capability in Python in order to slow down the phrase without pitching it down.This takes “did you enjoy the movie?” to “ddiidd yoouuu enjoooyyy tthheee moooviieee?” (without the ubiquitous low pitch that results in slowing down audio). This isn’t perfect, but will work for now. The goal is to make the phrase “did you enjoy the movie?” Into: “did you enjoy the movie?” More importantly than slowing down speech is the delivery of the phrase with a constant tempo.

3) removing harmful and/or aggressive phrases With respect to the 3rd point, harmful phrases, especially “remember?”, “don’t you remember?”, and “we talked about this already!!!” are aggressive and unhelpful statements for patients with dementia. The prototype takes the phrase: "Don't you remember!?!" and replaces it with: "Shirley, I love you!"

After converting the speech to text, the script searches the string for the combination of the word “remember” with the words “don’t,” “why,” and “talked” which all constitute harmful phrases such as “don’t you remember!?”, “why don’t you remember!?” “Remember, we already talked about this!?” Excluding and removing these phrases reduces anxiety for the recipient and trains the speaker to stop using them.

4) truncating and simplifying long phrases In time, this line item will encapsulate the largest portion of NLP in the device. Currently, the device truncates phrases beyond 7 words and alerts the communicator that they’ve spoken too long a phrase. This was decided in interacting with my grandmother, Shirley, who’s difficulty in understanding phrases becomes exponentially difficult beyond 7-8 words. In the future, the NLP portion would be able to consolidate long sentences into simple “yes” or “no” statements as well as cleverly manipulate speech into straightforward, succinct statements. Right now, it doesn’t deliver statements beyond 7 words to the user and indicates in the form of a blinking red LED to the communicator that their phrase was too long.

There is a multitude of techniques for clear and precise speech (e.g. avoiding noun strings, replacing complex words with simpler synonyms, avoiding expletives, eliminating “wordy” phrases, breaking up compound sentences, avoiding weak noun and pronoun expressions, etc…….). Incorporating these into the script/device will constitute the bulk of work in the future. The end goal is to take a phrase (complex or not), and be able to deliver it clearly, succinctly, and with an example of "wordiness" reduction would look like this below:

There are several risks/questions that will need to be addressed. The first is that, because the speech processing isn’t real time, would the time delay between watching a person’s mouth move and hearing the audio disrupt comprehension itself? What would be a good visual interface to the communicator? Is it possible to have a localized (offline) library for NLP, and if not, would it be possible for the device to have an internet connection to access a library online?

Improvements for the future: The next immediate step from here is to transfer the python script running on my linux laptop over to a raspberry Pi zero. It is clunky to have a prototype with a laptop and, eventually, this device will need to live behind a person’s ear. Apart from incorporating much more advanced NLP into the python script for delivering clear, precise, and terse speach, a good interface for stopping the real-time would need to be created. My initial thought is for the device to have a peak detector which, when activated, has active noise-cancellation to stop the incoming audio from directly entering the user’s ear. The incoming audio should be noise-cancelled and instead sent to the buffer for NLP manipulation.