I develop kaldi-active-grammar [0]. The Kaldi engine is state of the art for command and control. Although I don't have the data and resources for training a model like Microsoft/Nuance/Google, being an open rather than closed system allows me to train models that are far more personalized than the large commercial/generic ones you are used to. For example, see the video of me using it [1], where I can speak in a relaxed manner without having to over enunciate and strain my voice.
Gathering the data for such training does take some time, but the results can be huge [2]. Performing the actual training is currently complicated; I am working on making it portable and more turnkey, but it's not ready yet. However, I am running test training for some people. Contact me if you want me to use you as a guinea pig.
[0] https://github.com/daanzu/kaldi-active-grammar
[1] https://youtu.be/Qk1mGbIJx3s
[2] https://github.com/daanzu/kaldi-active-grammar/blob/master/d...
EDIT: Actual demo with coding starts at 18.00: https://youtu.be/YKuRkGkf5HU?t=1076
There are a few good OSS offline deep speech libraries including Mozilla DeepSpeech [3], but their resource footprint is too high. We settled on the currently less mature vosk [4], which is based on Kaldi [5] (a more popular deep speech pipeline), and includes a number of low-footprint, pretrained language models for real-time streaming inference. Research has shown how to deploy efficient deep speech models on CPUs [6], so we're hoping those gains will translate to faster performance on commodity laptops soon. You can follow this issue [7] for updates on our progress. Contributions are welcome!
[1]: https://github.com/OpenASR/idear/
[2]: https://cmusphinx.github.io/
[3]: https://github.com/mozilla/DeepSpeech
[4]: https://github.com/alphacep/vosk-api
[5]: https://github.com/kaldi-asr/kaldi
[6]: https://ai.facebook.com/blog/a-highly-efficient-real-time-te...
Can you use a touch screen or mouse? I went ~13 years without using a keyboard, and typed with mice (some customised), trackballs, and touch screens, mostly using predictive typing software I wrote. In that time I did a lot of programming, including a whole applied maths PhD.
One of the best mouse setups I came up with a variety of versions of was moving the cursor with one hand, and clicking with the other. Holding the mouse still to click the button accurately is a surprisingly problematic movement. I made a button-less mouse with just a flat top to rest the side of my hand on, with a bit sticking up to grip. Standalone USB numeric keypads can be remapped to mouse clicks and common keys.
Touch screens can also be very good, if set up right, as all the movement can come from the big muscles and joints of your upper arm and shoulder, and your fingers and wrist don't need to do much. The screen needs to be positioned well, not out in front of you, but down close and angled in a comfortable position to hold your arm for long periods.
Also, I bought a keyboard tray that supported a deep negative angle, which helped me keep a very anatomical (relaxed and natural) position.
Also, figure out that mouse, somehow. Something like the above, plus switch sides frequently.
I've no idea if that could help you, but after a few years, I'm largely in remission.
I know this isn't really what you were asking, but I'm somewhat hopeful you can find relief. Good luck.
In my experience, I've found any services claiming to do deep learning produced far worse results than what we could get with simple approaches. That is, when faced with non-grammatical sentences (or rather, sentences with a different grammar than English's). Of course that's because models are not typically trained with this use-case in mind! But the fact that you need a huge load of data to even slightly alter the expected inputs of the system, to me, was a deal breaker.
For the specific case of programming with voice, Silvius comes to mind. It's built and used by a developer with this same problem. It's a bit wonky having to spell words sometimes with alpha-beta-gamma speech, and it won't work without some customization, but on the other hand it's completely free and open source: https://github.com/dwks/us
Ask HN: I'm a software engineer going blind, how should I prepare? (https://news.ycombinator.com/item?id=22918980)
The best project I've seen for voice coding is Talon Voice, but I doubt anything novel is being done with it and deep learning. I'd suggest trying it out if you haven't. They also have a pretty active slack channel, you might have some luck asking them if they know about anything on the horizon.
So first, I switched my mouse to my non dominant hand ( left hand for me ), as that hand already has many things to deal with. I'm also using a workstation that allows me to mount my displays at eye-level while sitting or standing. Not hunching over is ergonomics 101. Second, I switched from a standard keyboard to a split keyboard. I tried many -- Goldtouch, Kinesis Advantage2, Kinesis Freestyle -- and ultimately settled on the Ultimate Hacking Keyboard.
I could write many more paragraphs on how I customized it and why it won out, but the most important thing is that is is split and it "felt" best, once I mastered the key placements ( arrows are in different places ).
Third, I started learning VIM. Vim is really awesome but up until recently didn't have great IDE or other editor support. Now it does so there's no reason to not use it. I mostly use it for quickly jumping around files and going to line numbers.
Fourth, I'm always looking to optimize non-vim shortcuts in my editor. For example, expand-region ( now standard in VSCode ) is one of my favorite plugins.
Fifth, I'm very conscious of using my laptop for long stretches of time. Mousing on the mousepad is much more RSI inducing than using a nice gaming mouse and the UHK keyboard.
All of this to say that RSI doesn't have to be career ending. If you're doing software work and you have functioning hands and wrists you should definitely look to optimize typing before looking to speech to code. Good luck!
I am not a coder, I am a writer. I wonder why all these AI people are trying to create things that will displace my means of earning a living instead of something that will create applications?
Why can't I tell my Mac: "Computer: take this collection of files and extract all the addresses of people in Indiana."
I always wanted to learn vimspeak: https://news.ycombinator.com/item?id=5660633
Food for thought for sure. Good luck.
That in combination with switching to a Lisp (Clojure) almost made it feasible for me to code with RSI.
I just became a manager instead because I couldn’t work from home and talking like that in the office was a no-go for me.
If that’s your cup of tea you’d be surprised at how happy upper management is to have someone who’s actually good at technology be willing to engage with them.
1) try to minimize the amount you have to speak by leveraging auto completion as much as possible. For me TabNine [1] has been great help in that regard
2) try to use snippets as much as possible to reduce boilerplate code and because you can simply tab through the various fields. For me it has been great help that with sublime it is possible [2] without installing anything to have all of my snippets inside dragonfly grammars or even generate them dynamically [10] providing for much-needed structural control over what you write. I know this is more primitive (at least for the time being, there are ideas to improve it) than what you are asking for but for me it has been enough to make C++ enjoyable again! unfortunately my pull request to integrate this into Caster [3] has fallen behind but all of the basic functionality along with various additional utilities is there if you want to give it a try. Just be aware of these little bugger [4] that applies here as well!
3) not directly related to code generation but if you find yourself spending a lot of time and vocal effort for navigation consider either adding eye tracking to the mix or utilizing one of the at least three project that provide syntactical navigation capabilities. As author and more importantly as a user of PythonVoiceCodingPlugin [5], I have seen quite a bit of difference since I got it up to speed, because a) even though it is command driven ,command sound natural and smooth b) though they can get longer ,in practice utterances are usually 3 to 5(maybe 6) words , which makes them long enough so that you do not to speak abruptly but short enough that you do not have to hurry to speak them before you run out of breath c) and yeah I personally need less commands compared to using only keyboard shortcuts so less load for your voice! The other two project in this area I am aware of are Serenade [6] and VoiceCodeIdea [7] so see if something fits your use case!
4) use noise input where you can to reduce voice strain. Talon [8][9] is by far the way to go in this field but you might be able to get inferior but decent results with other engines as well. For instance, DNS 15 Home can recognize some 30+ letter like "sounds" like "fffp,pppf, tttf,shhh,ssss/'s,shhp,pppt,xxxx,tttp,kkkp" , you just have to make sure that you use 4 or more letters in your grammar (so for instance ffp will not work). recognition accuracy is going to degrade if you overloaded too much but it is still good enough to simplify a lot of common tasks.
5) give it a try with a different engine, I was not really that much satisfiedwith WSR either
6) see if any of the advise from [11] helps and seek out professional help!
I realize that my post diverges from what you originally asked for but I feel the points raised here might help you lessen the impact of voice strain for the time being until more robust solutions like the gpt3 mentioned in one of the comments above are up and running. My apologies if this is completely off topic!
[1] https://www.tabnine.com/ [2] https://github.com/mpourmpoulis/CasterSublimeSnippetInterfac... [3] https://github.com/dictation-toolbox/Caster [4] https://github.com/mpourmpoulis/PythonVoiceCodingPlugin/issu... [5] https://packagecontrol.io/packages/PythonVoiceCodingPlugin [6] https://serenade.ai/ [7] https://plugins.jetbrains.com/plugin/10504-voice-code-idea [8] https://talonvoice.com/ [9] https://noise.talonvoice.com/ [10] https://github.com/mpourmpoulis/CasterSublimeSnippetInterfac... [11] https://dictation-toolbox.github.io/dictation-toolbox.org/vo...
Also, talk to an ergonomics person about it, and it sounds like notebooks are out at this point unless you have an external keyboard, mouse and monitor.
Check out this success forum of people who have healed from all kinds of chronic pain symptoms by dealing with stress and changing their mindset:
https://www.tmswiki.org/forum/forums/success-stories-subforu...