I've been experimenting with a concept called Propheciple — a decentralized, cryptic platform for exploring new ideas in the evolving economy. Would love feedback or thoughts on this direction:
Neat. I've been looking at paid options: Superwhisper, Wispr Flow, Willow Voice, and VoiceInk. I found Superwhisper to be slow. Willow is almost instantaneous. VoiceInk is faster than SuperWhisper, using the Whisper Large Turbo 3 model. All require a sub (at least $12/mo) except VoiceInk which has a one time cost.
How does the accuracy of yours compare to Whisper-based models?
It's the same (or better) than Whisper models, in my experience.
However, Wispr Flow does post-processing, so its output might be more useful, as it removes fluff from your speech.
I think it's possible to implement fast, local post-processing using the Gemma models. So I will give it a shot. If it works, then the output will be as good as the best paid options available.
Needless to say, if you speak very precisely, then my project is all that you need. It's almost 100% accuracy, I haven't seen a mistake yet (crazy, I know).
Thanks for the details. Yes, post-processing will be very useful for correction. Also, when I tried yours, I noticed some issues with punctuation - I was expecting it to insert a period and then add a space before the next sentence.
It is possible. However, given that the transcription was always instantaneous, I didn't see the need for it. I will attempt to implement it for the next version.
I've been experimenting with a concept called Propheciple — a decentralized, cryptic platform for exploring new ideas in the evolving economy. Would love feedback or thoughts on this direction:
https://www.caard.net/profile/propheciple/ce562ce7-a75d-4a0e...
Neat. I've been looking at paid options: Superwhisper, Wispr Flow, Willow Voice, and VoiceInk. I found Superwhisper to be slow. Willow is almost instantaneous. VoiceInk is faster than SuperWhisper, using the Whisper Large Turbo 3 model. All require a sub (at least $12/mo) except VoiceInk which has a one time cost.
How does the accuracy of yours compare to Whisper-based models?
It's the same (or better) than Whisper models, in my experience.
However, Wispr Flow does post-processing, so its output might be more useful, as it removes fluff from your speech.
I think it's possible to implement fast, local post-processing using the Gemma models. So I will give it a shot. If it works, then the output will be as good as the best paid options available.
Needless to say, if you speak very precisely, then my project is all that you need. It's almost 100% accuracy, I haven't seen a mistake yet (crazy, I know).
Thanks for the details. Yes, post-processing will be very useful for correction. Also, when I tried yours, I noticed some issues with punctuation - I was expecting it to insert a period and then add a space before the next sentence.
Any chance of getting “live dictation” or showing the words in a stream rather than all at once?
It is possible. However, given that the transcription was always instantaneous, I didn't see the need for it. I will attempt to implement it for the next version.
Anyone know of a program like this, but runs on Fedora and does the transcribing in real-time?
I had created another program like this, which is cross-platform.
It doesn't have as good of a UX as this, but it should help unless you find a better option.
https://github.com/aviaryan/voice-writing-electron