I tried to get SSML to work with the Microsoft SAPI API previously for a non-game project and didn't get anywhere. While the API's exist (SPF_PARSE_SSML etc.) it doesn't seem like actual voices with that capability ship with Window's and I always got an error return. Apparently there is a lot of custom voices around. If you manage to make that work would certainly be interested.
Another offline one I had is CMU Flite, but people here seem to like the quality a lot less than Amazon/Google/Microsoft solutions. I think maybe if added in some better voices than the default ones could get a much nicer result, and possibly if figure out how to do that could get enough variety for game purposes. And has some SSML support, but never tested as it wasn't a priority as other TTS providers were more desired. Would be interested to know as the only free cross platform one I was using right now.
There is also Festival/Festvox, I have no experience with it however. I believe it's more focused on the research side than as a end-product TTS engine. I believe you can convert these voices to be Flite compatible.
I also saw Cepstral come up a few times, but no customer has asked for it and it's non-free so not investigated. No idea what their pricing model is, and if can get a reasonable one to embed into a product rather than as a server-licence.
Other than those, Amazon, Google and Microsoft each have their cloud one. The pricing didn't seem that bad to me, especially with caching, and the API's were easy to deal with (I forget which, but some with C/C++ SDK's, some I used the HTTP API directly with my own client code). Depends just how many hours of TTS you are thinking of (practically reading a novel?), and of course the online requirement.
Zouflain said:
procedural text generators
How procedural? If you are mixing pre-made parts together, you can get some OK results with having corresponding pre-recorded TTS and joining that together (which you need to do with semi-dynamic voice acted lines as well).
This was something I was actually basically thinking of doing in a game-context, to get something of hopefully passable standard cheaply. The pricing in a do-once context is really cheap, more a question of quality and if actually better than just having the player read the text in the end.