Reboosh | ♥ Nonprofit | Bring Your Karma

Bring your karma
Join the waitlist today

HUMBLECAT.ORG

Blind and Visually Impaired Community

r/blind

Full History - 2020 - 09 - 05 - ID#imyn55

↑

↓

ocr app called Voice (self.Blind)

submitted by artemisxsky

Hi everyone,

For those of you who don’t know me, I’m Shalin Shah, an undergraduate student studying computer science at UC Berkeley. I am the developer of the application Voice OCR Document Reader on the iOS App Store ($1).

My journey with the blind community started when I was 15, after creating and launching the first version of Voice. At the time, Voice was the only 100% free OCR application built with VoiceOver. However, I was just learning to code at the time, so there were many bugs in the app and it crashed a lot. As time went by, I stopped updating it and sort of lost touch with the community I had worked so hard to build with this product.

Last summer, I decided to rekindle that community by launching a fresh upgrade to the application. I had a lot of fun being able to use the programming I learned in college to create a better version of Voice. The response from this community was overwhelmingly positive and has inspired me to keep going. So I’ve been working hard the past few months to build something better than all the previous versions of Voice combined. I have some exciting news for you guys.

I would like to introduce Voice 5.0! Voice 5.0 was crafted from scratch using a fresh new technical stack, that’s built to perform with precision and reliability.

The OCR quality has never been better, even on images with really bad lighting, horrible focus, and messy handwritten text. To put the icing on the cake, our field of view detector is more accurate than ever, giving you feedback whenever it sees a document.

Voice has 180 new text-to-speech reading voices that are simply gorgeous. You won’t find higher quality voices than these on any application on the planet, period. They harness the power of cutting-edge artificial intelligence and contain pitch-perfect intonations for more than 50 languages.

Finally, Voice is entirely hands-free. There are a fresh set of commands that make it easier than ever to control Voice with just your words! In addition to saying things like “capture” and “read”, you can now say things like “voice pause”, “voice play”, “voice restart“, and “go back” while Voice is reading your document. And when you don’t want to talk to Voice, we’ve included a polished, revolutionary user interface upgrade with a new standard — no more than 4 buttons per screen. Crafted with VoiceOver, you simply cannot find another interface that is this simple to use.

For the next 24 hours, I have made Voice $1.99. After that, it will return to its original price of $4.99. For those who have already purchased, all updates are free. Tell everyone to download this so they can get it at the sale price!

Additionally, in order to get access to the 180 premium quality reading voices, you will need to subscribe to Voice Pro, which is a monthly subscription costing $4.99 a month. I wish I could make this free for everyone, but powering the server to constantly improve the AI voices is super costly, especially as a solo developer. I still wanted to provide this functionality for the power users who might really need this, so I’ve built it as a subscription. But you can still get access to standard quality voices in over 30 languages without the subscription.

With that being said, there are still many improvements I will be making to version 5.0 in the next few days.

Here’s what’s on my To-do list so far:

1. Make the scanning speed significantly faster.
2. Adding Offline OCR for the first time ever
3. Add a feature to import PDFs found on safari to Voice OCR.
4. Exporting as a PDF should be fully accessible, right now it just exports the Raw Image.
5. Create an instruction manual and video that is accessible and teaches people how to use Voice, with the full set of voice commands.

If you guys use the application and have any feedback, PLEASE comment below or send it my way at $1. All ideas are super welcome, and I would love to incorporate the community feedback into the app over the next few days.

Here is the link to the app on the iTunes App Store: $1

Lastly, if the product is useful for you, it would mean the world to me if you could write a quick review on the App Store and share this with your friends. It’s the community feedback that really helps me continue to maintain this.

Anyway, feel free to reach out! I will respond to all emails very quickly. Thank you all for your time.

Rethunker 4 points 2y ago

I'm sighted, but I decided to try the app. I've done work with OCR and usability, which will put a certain spin on my comments. Also, I'm going to write in the mode of a software tester, so I focus mostly on usability or functionality bugs, which I list in no particular order. I'll wrap with some positive comments.

I like the concept of a voice-activated app, and I know it's tricky to determine when to allow or disallow the user to interrupt the voice, but I would suggest having some means to skip past or speed up the introductory announcements. The series of introductory announcements are functionally equivalent to a series of unskippable modal dialogs. I wanted to start capturing images within 5 - 10 seconds. Thus I was already a little perturbed before I even had a chance to test functionality.

The introduction seems somewhat redundant. I'd be surprised if someone spent $2 on an app but didn't already know what the app was meant to do. So perhaps just one or two voiced lines would be sufficient before the user can start capturing images and trying some features.

Accuracy is okay, but not as good as VoiceDream Scanner, especially with uneven lighting. If you're doing some kind of preprocessing on the image, then your binarization step or thresholding step or whatever doesn't seem to adapt as well as it might to changing brightness and contrast across a page of text. As a computer science major you may have already taken one or two classes on image processing / computer vision, but if not I think it would be well worth your time. Existing image processing libraries like OpenCV are useful, but application-specific work often requires tweaking their implementations of standard algorithms. (Perhaps you're already well on your way here.)

Button functionality is sometimes hard to guess. In video / image capture mode, there is a right arrow ">" button in the lower right corner that I wouldn't have guessed meant "process the previously captured image." I was expected it to capture and process an image from the active video stream.

The "Extracting Text" notification gives no sense of how long processing may take. There's no voice announcement, so a blind user or sighted user could easily get the idea that the app has hung.

Just now, when I said "capture" and then "read," the results were from a previous image. I was attempting to reading the letters on the keys of my QWERTY keyboard.

I'm not sure what you mean by "Adding offline OCR for the first time ever." Do you mean the first time for your app? Or the first time for any app? Offline OCR, including CNN-based OCR, has existed for decades.

Although an instruction manual would be good, your app would be competing against other apps that don't require manuals. I would suggest walkthroughs instead. I haven't read a manual for any phone app in the past decade.

Out of curiosity, what is the value of having so many voices for a monthly subscription? Is that just so that someone can change up the voice every now and then? The add-on cost is a goodly fraction of the cost of a variety of streaming services (e.g. Netflix with their voiceover support). If you have marketing data suggesting $5/month is worth it then no worries, but the price seems a bit steep to me.

In the filmstrip view at bottom, when I touched an image it disappeared. I don't understand why.

The default seems to be that the text from two processed images will be combined into one text. That could be useful for text spanning multiple pages, but if someone first read a medicine bottle, and then a minute later a food label or two or three, and then other items, etc., it could be very confusing. My suggestion--and again, these are just suggestions--would be to default to a mode in which you don't save successive texts, and simply read the most recent capture. Then once the user becomes accustomed to that you could introduce stringing texts together after making multiple captures.

In general, once I got past the unskippable intro announcements and tried "capture" and "read" it wasn't clear what I should do next. Buttons on screen didn't have accompanying captions to indicate their purpose (for sighted users), though I guessed right about half the time. In short, it wasn't quite clear to me how a user would explore the app and gradually learn more.

Finally, there may be times when a blind user may not want to use voice command. (This is something I've been told in conversations about other voice command apps.) If voice command isn't used, then gestures would be a handy, silent alternative input method. I know that could add a pile of extra work, but I think you could get feedback on the desirability of gestures from your users fairly quickly. If gesture control is already built into the app, then I haven't discovered it yet, though I'd be willing to test.

"... no more than 4 buttons per screen. Crafted with VoiceOver, you simply cannot find another interface that is this simple to use." Perhaps: "you simply cannot find another OCR interface that is this simple to use." There are certainly simpler interfaces!

I think you could make further cuts to the number of buttons per screen and improve usability. Different buttons appear depending on the current operation mode--a fundamental problem of modes--and I found it difficult at times to determine what mode the app was in, and what would happen when I pressed certain buttons. (Do you have a stage diagram for the app? When video is showing and there's a previously captured image, it's not clear to me what state the app is in, which is why clicking the ">" did something unexpected.)

Again, I like the concept of voice control for an OCR app. It's harder to conduct testing nowadays when you can't sit next to one tester after the next and observe their behavior (unless you've tried that on Zoom?), so I admire your work to push out an app now.

As a user it's also good to see that you're taking into consideration the number of buttons per screen and other implementation details related to usability. I for one would be pleased with something that had less functionality but faster processing. If you identify what users do with the app 80% to 90% of the time--which perhaps you've already done--I'd recommend stripping away almost everything else.

Good luck with your app and with the fall semester!

retrolental_morose 4 points 2y ago

Wow, extensive and positive comments.
As a blind person using my phone in a professional capacity, discression and speed are the factors most important in the majority of my OCR needs, which are quick identification of a letters addressee, name on a door, item of of food, expiry date on medicine etc.
I use VoiceDream scanner and SeeingAI a lot, neither of which have a subscription model, and would need some serious persuasion to change.

As an aside, voice control and voiceOver (i.e. speech output) are integral parts of the OS now. I don't know anyone who would want to switch off voiceOver to read something scanned, so by inference having a separate voice control system in your app would also subvert the norms regular users of voice control have come to expect.

Rethunker 1 points 2y ago

From having surveyed and interviewed a goodly number of folks in the blind community, I have to say you’d be a great person to have in a test group! I hope you and the app creator can connect.

For that matter, I’d like to connect, too! I’m always up for meeting folks and chatting about what they want. Speaking only for myself, I have to say that overcoming sighted bias requires daily effort, even at the risk of having some of my ideas knocked down a peg or two.

The app creator is off to a good start, especially as a college student, and in my small way I hope I can help accelerate development efforts for people who take the work seriously.

This nonprofit website is run by volunteers.

Please contribute if you can. Thank you!

Our mission is to provide everyone with access to large-
scale community websites for the good of humanity.
Without ads, without tracking, without greed.