Reboosh | ♥ Nonprofit | Bring Your Karma

Bring your karma
Join the waitlist today

HUMBLECAT.ORG

Blind and Visually Impaired Community

r/blind

Full History - 2017 - 06 - 09 - ID#6ga38k

↑

↓

Help with site accessibility... (self.Blind)

submitted by l7synth

Hi everyone, I have a site that I'm building that hosts audio books and other narrated works produced by a text to speech engine I've developed. I'd like to make it accessible to people with vision impairment, but I'm not sure where to start.

I opted for restricting access to the library to members who accept an age policy checkbox because there are adult content options. A link to access the library is then sent to the provided email address. What are the various options people use to access accounts in this manner? Are there more immediate steps that I could take to make this process easier for different degrees of vision loss?

As a veteran in the software industry, I'm a bit embarrassed by my ignorance to this solution space. Any feedback would greatly be appreciated.

I have a couple of general audience titles that are free to download as well:

http://www.l7synth.com/sherlock-holmes

http://www.l7synth.com/made-with-cc

http://www.l7synth.com/weekly-review

Any feedback about the quality of the vocals compared to other products on the market would also be appreciated.

Thanks!

-LHM

rkingett 1 points 6y ago

Hi there. While I personally would have no use to visit this site and sign up because there is Librivox and audible.com and Libro FM and many many many many more places I can hear these for free, YouTube, too, our gov library for the blind, plus the screen reader I have, with project Gutenburg or however it's spelled, I think this site would be good to some newly blind people so below are links to get you started on web accessibility.

http://webaim.org/articles/nvda/

https://webaccess.berkeley.edu/resources/tips/web-accessibility

l7synth [OP] 1 points 6y ago

Audible is definitely a good pick from a selection perspective, but librivox is really difficult for me to listen to. I'm sure it's a "me" thing but the way some people narrate irks me just enough to make it distracting. Gutenberg offers a lot of screen-reader recordings too, but they are usually the very robotic 90's tech.

And I agree that my site has no significant value at the moment, which is why membership is free (and might remain so.) I only launched it a few weeks ago. But I intend to separate my site from the other free options in two key ways. The first is to deliver content that isn't available on them, through exclusive deals with authors who are looking to gain exposure but can't afford the costs of modern narration services.

The second is to deliver adult content. While that is available on audible and to a tiny degree on others, the selection is abysmal. Once I've produced about 100 books, scaling it to 100,000 will be trivial as the amount of training is reduced with every book. I'm already down from a couple hours to about 30 minutes of modeling to produce lengthy novels. In another 20 books I expect it will take only 5 minutes or less. And at some point the production will be reliable enough that I can just publish wantonly, and full length books press in under 10 minutes. I also wouldn't think of charging anything close to $200/yr.

And this is all just another step towards the ultimate goal of a strong AI so I'll keep going until there is value to share.

Thanks for the link to nvda, I'll test it on the site myself to see if there are things I can do to make it easier.

bradley22 1 points 6y ago

Another thing to think about is screen reader spead. Most blind people will have their screen readers set quite fast. For me at least, listening to the books I listened to, became quite borring as I could listen to these books if I wished either through audiobooks or through my screen reader. By the way, when we say that project gutenburg has readable books, we mean that there are thousands of books on that site that our screne readers can read. Not the audio books they have.

bradley22 1 points 6y ago

Hello.

I can confirm that signing up is no problem for a screen reader user or a VI user as all you must do is put your email address in the box, tick the checkbox and click on the sign up button, I think it was called.

I signed up and listened to a bit of Alice in wonderland. I honestly cannot hear a difference between if I read it using the hazel voice and gutenberg, for example, and your TTS engin.

From my limited understanding you want to make the voice self correct, is that right? So if it comes across a word that it says wrong it can go into a databass of sorts and correct it the next time? If that is the case, I'm sorry to tell you that all screen readers have this function already in the form of a dictionary. You add words to the dictionary like this:

1. Add the word.

2. Tab to the box that will be labeled something like replacement.

3. Type in that replacement. For example, the word might be Foyer and instead of that I would want it to be pronounced foiyay.

4. Tab to the okay button and you're done. When ever the screen reader comes across that word it will say it how you sounded it out in the dictionary instead of the way it was said originally.

I have come across a spelling mistake on your webpage.

On the Library page within the adult section, there is this sentence; Keilly discovers and unusual plant with euphoric secretions. It should be Keilly discovers an unusual plant with euphoric secretions.

I hope this helps.

l7synth [OP] 1 points 6y ago

Yeah, you're on the right track. The problem with the design of existing products is those improvements are trapped in your screen reader and not automatically updated for everyone. And just like you commented on above "Bonjour" is not easily corrected just phonetically. My system employs multiple strategies from cadence, emphasis, and splicing to recreate a wave form that matches a correct pronunciation.

But the problem is even worse than that - there are words like "read" that could be "reed" or "red" and I'm not aware of any screen reader that can pronounce them correctly in context. So while you won't hear much of a vocal difference between my copy of Alice in Wonderland and dropping Gutenberg's copy in a word document and hitting "play" there will be a huge difference in terms of the correct pronunciation and the gait in dialog.

And thanks for pointing out the typo, I'm terrible with those and it's one of the problems I intend to automate away :D

bradley22 1 points 6y ago

Ah. I understand a bit more now. another one would be minute and minute. You know the time thing and the minute detail. Although NVDA does a good job with it, at least the symth I'm using does, other symths may not.

bradley22 1 points 6y ago

Hello. I listened to the files. they sound exactly the same as microsoft hazel to me. I would personally not use this site as we have sites like gutenberg library to download nearly any older book we wish and can read them with our screen readers. having said that; others might use your site and find it great so i'll give you some links you can go to to help your site along when it comes to accessibility and blindness.

Getting started with ARIA. https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA

Getting started with accessibility. https://webaccess.berkeley.edu/resources/tips/web-accessibility

I hoep these two pages help you to understand a little more about how accessibility can be added to your site.

fastfinge 1 points 6y ago

Where does one get microsoft hazel? Never heard of that voice. Is it one of there SAPI voices, or part of the Microsoft Speech Platform?

bradley22 2 points 6y ago

I believe it's one of the voices in the Microsoft speach platform. You have to download it. I'm not sure where you download it from but i do know it's in a big list of voices for a lot of languages.

fastfinge 1 points 6y ago

Thanks!

l7synth [OP] 1 points 6y ago

Yep, you can get it from Microsoft, however they won't install properly on Windows 7+ without modifying the registry...

This page should get you going: https://superuser.com/questions/590779/how-to-install-more-voices-to-windows-speech

Send me an email to l7synth@gmail.com if you need help with any of the steps. It does require modifying the registry but I can write you a couple scripts that will export and re-import your specific configuration if you'd rather not muck with windows internals.

fastfinge 1 points 6y ago

Thanks! Nah, I'm fine with modifying the registry. I've had to do it enough times for various reasons...

bradley22 1 points 6y ago

Hello. I read in this thread that your voice does not have trouble with non english words? Could you show me a quick example with the word bonjour? I just tried it with hazel and the word is messed up quite badly. Same with the word hola. It's said like oh la, instead of small o sound la. I hope you can understand what I'm trying to say.

l7synth [OP] 1 points 6y ago

It doesn't support a large number of foreign words but a few common words in French, Spanish and Japanese. And the accent is still British, but bonjour and hola are both pretty decent. I'll try to post a sample mp3 of those abilities this weekend.

bradley22 1 points 6y ago

Thank you!

l7synth [OP] 1 points 6y ago

Hazel is the underlying voice that I'm using for the project, but the magic happens in modules that feed data to the voice engine and how that output is modeled. The design allows me to develop the ability to "speak" without relying on the voice engine that produces the voice.

The site as it stands is just a starting point. Audio books are just a way for me to promote the project. A goal of mine is to replace screen readers entirely and allow people to interact with computers vocally. It's a long term project for sure. So for now I'm designing a system that can encode anything and everything with a high degree of accuracy and comprehension. I suspect that over the next year L7 will host a library sizeable to project Gutenberg, but without all of the garbled mispronunciations that plague many screen readers.

Thanks for the links too. While I didn't implement ARIA tags, I feel like I have a good grasp of making the content screen reader friendly and the interface accessible. The problems I'm concerned with are how to make user accounts and pay-walls friendly, not just the markup. I'm sure some services are easier than others to deal with in terms of account creation, email management and settings/profile data. So I'm looking more for an example of a service that does it really well. But I've added ARIA to my list of todos.

bradley22 1 points 6y ago

Hello. I would strongly advise asking blind or disabled or even knon disabled people whether they would want to interact with their computer using voice. I know that I would not want this as I can type a lot faster than I talk as I have a slight stutter. I'd recommend making a survey so that you know you're not waisting your time doing something you think will help when in reality it may not.

Why exactly do you want to replace screen readers with a vocal system?

l7synth [OP] 1 points 6y ago

I think I just used poor word choice, and didn't clarify my intent. The goal isn't just to replace screen readers for the blind, it's to fundamentally change human-computer interaction. It won't replace all typing, and it won't replace all forms of reading for those who can. However there are a lot of people from children to the paralyzed or just people with carpal tunnel syndrome that would benefit from vocal interactions that are as natural as a discussion with a teacher or peer. I can't say for sure if I'll reach that goal, but it is the future when it comes to things like Alexa and Siri and hands down it's the way to go. (We're just not there yet in terms of AI to make it usable.)

When it comes to screen readers - they read text on a screen, where as my system will be able to describe what's on a screen, and independently read the text that matters - regardless of whether the developers who built it designed it correctly. I think that does have value for everyone regardless of vision or ability, if only from a multitasking perspective.

And then of course if I relied on surveys to build this AI people would tell me to destroy it before it destroys humanity :P

bradley22 1 points 6y ago

Ah. I understand. I think you're right. It would have great value to those who have cerebral palsy and other disabilities/conditions. I am glad I'm on the list now as I can see the changes in real time. Sorry if my posts come off as harsh, I don't mean them to be.

fastfinge 1 points 6y ago

I...wouldn't call your voice equal to Alexa, I'm afraid. And there are even better alternatives on the market, like Neospeech voiceware. Plus, neural network based solutions like DeepVoice and WaveNet are going to hit the market in a few years. By the sound, I suspect you're refining festvox, or one of the other HMM based tts engines.

That isn't to say your work is bad! Depending on how responsive it is, it could make for a decent open-source system voice. The best freely available voice currently for screen-readers like $1 to use is espeak. You've got espeak beat by a country mile. However, if you want to go commercial...meh. I can already buy better alternatives. And in a year or two, neural network speech generation is going to blow everything else out of the water once again.

Nighthawk321 2 points 6y ago

Could you go into detail about these new voices?

fastfinge 1 points 6y ago

Nope. Mostly because my understanding of how this stuff works is so general that I can't explain it. But I can give you some links! :-)

Wavenet TTS: https://deepmind.com/blog/wavenet-generative-model-raw-audio/

DeepVoice: http://research.baidu.com/deep-voice-production-quality-text-speech-system-constructed-entirely-deep-neural-networks/

Lyrebird: https://lyrebird.ai/demo

That'll get you started. There are several other systems in development using these new neural network techniques, but those are a pretty good overview of what's coming.

Nighthawk321 1 points 6y ago

Awesome, I'll check it out for sure :).

l7synth [OP] 2 points 6y ago

First and foremost thanks for the feedback! You are the first person to give any :D

Alexa and Siri are better for sure (in fact Siri just got an update that is impressive), but I still think mine competes well with them compared to (m)any of the open source options I've tried. So I chose the term "on par." The synthetic nature of Alexa also sines through in certain uses, for instance if you ask it to play "Shake Shake Shake, Shake your booty." And I suspect that is why they don't use Alexa to report the news.

My system is currently designed to operate in conjunction with and without commercial engine options. I'm still evaluating which engine to work with next, perhaps you could suggest your "top pick" of commercial options?

Audio generation from my system is very fast, full length books are generated in just a few minutes. And I've implemented a lot of NLP algorithms to correct for proper pronunciation of heteronyms, poorly written or formatted corpora, and for speaking words it has yet to be trained on. Most commercial options require separate modules to speak foreign words for instance, while mine does not.

As to the future, I'm definitely keeping my eye on tech like WaveNet. And while I won't be able to compete with that using my current strategy, that's why I designed my system the way I did. This will allow me to focus on building a more robust AI/NLP component in the meantime, which I'll be releasing in the form of a chat bot hopefully by the year's end.

Did you have any problems with comprehension? My main goal with this phase of the project was to produce something that could easily be understood, and easily modified for new content. But I've been listening to this voice for months on end. And I know that I can understand what my toddler says when other adults cannot just from repetitive familiarity. So it's hard for me to tell.

Thanks again for your input!
-LHM

fastfinge 1 points 6y ago

> Did you have any problems with comprehension?

No. But then, I listen to TTS all day, every day, 12 hours a day. I could have listened to that voice at least four times faster without losing anything.

> "top pick" of commercial options?

$1, also sometimes sold under the brand "voicetext", is by far my number one option. It's quick, sounds natural, and is easy to listen to.

Second would be the $1 stuff. It's what Apple uses on IOS and mac. It's also quick and natural, though not quite as human sounding as neospeech.

Then I guess would be the $1 TTS. It sounds pretty human. But for whatever reason, it just isn't enjoyable to listen to. It sounds too stiff, somehow. Like a bad human actor, poorly reading lines. The voice is good, but the intonation is just off somehow.

Lastly are all the other natural TTS commercial solutions. Your voice has all of these guys beat, for sure. But they are an option. $1 has been around forever, and they're pretty darn cheap. But they're also state of the art for the year 1999. $1 are good if you happen to need strange British accents (Irish, Scottish, and a bunch I've never heard of). Otherwise, don't bother. $1 are pretty OK voices. They must have a pretty good IOS API, or be cheap, because a lot of developers insist on using them. They're fine for what they are. $1 are overpriced and over marketed. I mean, they're OK. But they're not "the best voices for windows 10" like they claim over and over and over again on their website. And they're not worth the money when so many better options exist.

> compared to (m)any of the open source options I've tried.

Yes. Yours is better than all of the open source options. If you made that voice open source, I'd start using it tomorrow. If you wanted me to pay for it, though, I would be much less interested; I'd rather spend the money on neospeech or vocalizer. I'm sorry if that sounds blunt. I'm not trying to be offensive, just give you an honest idea of how I think you're doing.

l7synth [OP] 2 points 6y ago

First I'd like to say that I didn't take offense at all, honest input is hard to get. I don't have plans on selling the voice itself, but I also don't have plans on open sourcing the platform. The voice is just a small component of the project, but an important one in terms of user acceptance.

And honestly I thought you were crazy to think neospeech was natural after hearing a sample - but then I discovered the date on the sample was from 2011, and after finding one from 2016 my jaw dropped. I had no idea commercial options were that advanced. Unfortunately my jaw is still stuck after looking at their pricing models :D However they will be my top pick for the next voice unless something new shows up.

fastfinge 1 points 6y ago

Yup. Neospeech pricing is nuts. And they only sell B2B, not B2C. So it isn't possible for me to use those voices in my screen reader, even though I'd really, really like to. VoiceDream Reader on IPhone managed to license them, so at least I get to read audiobooks with them. If I recall correctly, I think even that wound up costing me 20 bucks. Worth it, though.

This nonprofit website is run by volunteers.

Please contribute if you can. Thank you!

Our mission is to provide everyone with access to large-
scale community websites for the good of humanity.
Without ads, without tracking, without greed.