The learning curve on MacOS isn't that steep, and I would argue that having a screen-reader cursor that can easily reach any widget using the same basic controls is more intuitive than having to memorize specific key combinations for every application, and that's when applications do have keyboard controls. NVDA has Object Navigation which essentially does the same, but is much less intuitive to use than VoiceOver.
As for OCR, VoiceOver does support it, and even attempts to recognize images, you just have to enable it in VoiceOver Utility -> VoiceOver Recognition, and then use VO+Shift+L to make it attempt to recognize whatever is in the VO cursor. Someone also made an app called
$1 that can be used to recognize and navigate text in a window, so you can definitely do something about it.
I'm not saying that VoiceOver is perfect, as I find navigating the web with it to be an ordeal compared to using NVDA, but it's not as bad as you claim either.