Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spacemen TTS #68106

Closed
wants to merge 5 commits into from
Closed

Spacemen TTS #68106

wants to merge 5 commits into from

Conversation

tralezab
Copy link
Contributor

@tralezab tralezab commented Jun 30, 2022

About The Pull Request

Just a little funny experiment, or really, how bone rattling it is to API call from dm. This makes Spacemen speak with TTS, which they pick in character preferences.

spessmentts.mp4
  • fix prefix related things with silicons
  • it stack traces a lot on bootup
  • add a signal for post say so i can tts something without other signal hooks ruining things
  • Let people truly pick any seed they want (prefs didn't have a box for entering strings, as far as I know)

Shoutout to NovelAI for their incredibly impressive TTS model!

Why It's Good For The Game

Really mostly for luls, there are several reasons why this can't be permanently in the game.

Changelog

🆑 NovelAI's TTSv2, Armhulen, Special credits to Iamgoofball for initial exploration of the idea, code to reference, etc
add: Spacemen now speak with TTS.
/:cl:

@tgstation-server tgstation-server added Config Update Time to bother the headadmins for three months to get your config applied Feature Exposes new bugs in interesting ways UI We make the game less playable, but with round edges labels Jun 30, 2022
@carshalash
Copy link
Contributor

Oh god it's ready.

@Mothblocks Mothblocks added the Do Not Merge You must have really upset someone label Jun 30, 2022
@SkeletalElite
Copy link
Contributor

Speedmerge

@Miczu555PL
Copy link

Add this permanently as disabled feature by default.

I wonder how spells would sound..

@SkeletalElite
Copy link
Contributor

I unironically like this a lot

@mc-oofert
Copy link
Contributor

Seperate languages such as felinid are now pointless because TTS reads them in english regardless if you understand it or not

@optimumtact
Copy link
Member

Seperate languages such as felinid are now pointless

what do you mean by now?

@MMMiracles
Copy link
Contributor

Why is that TTS so good wtf

@mc-oofert
Copy link
Contributor

Seperate languages such as felinid are now pointless

what do you mean by now?

if you dont know the felinid language for example and someone speaks felinid you wont understand the text but the TTS voice will say it in english so you can just understand felinid language without actually knowing it

@necromanceranne
Copy link
Contributor

Is one of those voices Keanu Reeves?

@Jakksergal
Copy link
Contributor

I really want this to be added but I got a few concerns.

  • Does the API generate the AI text at the client, the server, or on NovelAI's server? If it's created on the NovelAI server, that presents a massive bandwidth usage for a server with 60+ users speaking unless there is an auxiliary script that prevents data being sent to players that couldn't hear it anyway.

  • Does the API support generating voice seeds on the fly? Does it support a variable configurable system to make new voices? If so, will players be able to use those features for their character?

@Bluedino1025
Copy link
Contributor

the tongue tied perk gets very confusing also would people talking on comms be heard

@tralezab
Copy link
Contributor Author

tralezab commented Jun 30, 2022

Seperate languages such as felinid are now pointless because TTS reads them in english regardless if you understand it or not

I'll probably make it only work for GalCom next test.

I really want this to be added but I got a few concerns.

Like I said, can't /really/ be added.

  • Does the API generate the AI text at the client, the server, or on NovelAI's server? If it's created on the NovelAI server, that presents a massive bandwidth usage for a server with 60+ users speaking unless there is an auxiliary script that prevents data being sent to players that couldn't hear it anyway.

NovelAI's server, and yes, that's why I've been targetting lowpop. The solution is the same at the 15.ai pr, aka tgui window playing webroot cdn stuff. So it's fixable, but one sad part would be the fact you no longer hear the voices in the environment spoken which I really like. Since I'm not trying to add it permanently, I'm sticking with environmental sounds because I love them.

  • Does the API support generating voice seeds on the fly? Does it support a variable configurable system to make new voices? If so, will players be able to use those features for their character?

Indeed it does, for the next text I'll probably remove the list of voices to pick and just let people roll for ones until they like one.

@AffectedArc07
Copy link
Member

Oh god its 2013 para all over again

interface/skin.dmf Outdated Show resolved Hide resolved
@@ -50,8 +58,7 @@
..()
if(say_mod && tongue_owner.dna && tongue_owner.dna.species)
tongue_owner.dna.species.say_mod = say_mod
if (modifies_speech)
RegisterSignal(tongue_owner, COMSIG_MOB_SAY, .proc/handle_speech)
RegisterSignal(tongue_owner, COMSIG_MOB_SAY, .proc/handle_speech, override = TRUE)
tongue_owner.UnregisterSignal(tongue_owner, COMSIG_MOB_SAY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registers signal
immediately unregisters it

not your fault cuz the code was already like this but lol, lmao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see it's some weird signal magic nvm

Comment on lines 1 to 27
/datum/preference/choiced/tts_seed
category = PREFERENCE_CATEGORY_SECONDARY_FEATURES
savefile_identifier = PREFERENCE_CHARACTER
savefile_key = "tts_seed"
priority = PREFERENCE_PRIORITY_SPECIES + 1

/datum/preference/choiced/tts_seed/deserialize(input, datum/preferences/preferences)
//if you figure out how to enter whatever you want than honestly take it idc seeds support that
return ..(input, preferences)

/datum/preference/choiced/tts_seed/init_possible_values()
return GLOB.tts_seeds_prefs

/datum/preference/choiced/tts_seed/apply_to_human(mob/living/carbon/human/target, value)
var/obj/item/organ/internal/tongue/tts_speaker = target.getorganslot(ORGAN_SLOT_TONGUE)
if(!tts_speaker)
log_admin("didn't apply tts seed to tongue")
return
tts_speaker.tts_seed = value

/datum/preference/choiced/tts_seed/create_default_value()
return pick(GLOB.tts_seeds_prefs)

/datum/preference/choiced/tts_seed/is_accessible(datum/preferences/preferences)
if (!..(preferences))
return FALSE
return TRUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can implement a text box preference this way:

/datum/preference/text
    abstract_type = /datum/preference/text
    var/max_length = 1024

/datum/preference/text/deserialize(input, datum/preferences/preferences)
    return STRIP_HTML_SIMPLE(input, max_length)

/datum/preference/text/create_default_value()
    return ""

/datum/preference/text/is_valid(value)
    return istext(value)

then over in base.tsx for preferences:

export const FeatureTextInput = (
  props: FeatureValueProps<string>
) => {
  return (<TextArea
    height="100px"
    value={props.value}
    onChange={(_, value) => props.handleSetValue(value)}
  />);
};

export const FeatureShortTextInput = (
  props: FeatureValueProps<string>
) => {
  return (<Input
    width="100%"
    value={props.value}
    onChange={(_, value) => props.handleSetValue(value)}
  />);
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@Farquaar
Copy link
Contributor

Farquaar commented Jul 1, 2022

felinids are now pointless

Fixed that for ya @mc-oofert

On a more serious note, I looked up NovelAI and it appears to be a paid subscription service. Is there a free version you're using for this PR or something?

@tralezab
Copy link
Contributor Author

tralezab commented Jul 1, 2022

On a more serious note, I looked up NovelAI and it appears to be a paid subscription service. Is there a free version you're using for this PR or something?

Someone needs a subscription for this to work, yes. Tests I've run the past few days are at my own expense, haha.

@Sylphily
Copy link

Sylphily commented Jul 1, 2022

yikes, how much?

@tralezab
Copy link
Contributor Author

tralezab commented Jul 1, 2022

yikes, how much?

don't worry, it's not a yikes. it's not pay per generation or I would have lost everything I owned from those like 15 people just spamming "A" in the bar

@Farquaar
Copy link
Contributor

Farquaar commented Jul 1, 2022

yikes, how much?

Looks like $10/month if I'm reading their prices right.
image
Not crazy, but $120 a year adds up. If it gets merged I imagine TTS might make for a decent Patreon monthly donation goal.

@tralezab
Copy link
Contributor Author

tralezab commented Jul 1, 2022

There is also a rate limit, that is reached after a few rounds of nonstop tee tee ess. And half of that package would see no use, aka the "use language models" part. So if this were to see any unirony it needs some special discussions with NovelAI itself

@tralezab
Copy link
Contributor Author

tralezab commented Jul 1, 2022

But I'm glad people enjoyed it while it was run

@davethwave
Copy link
Contributor

It is fun. Though a few bugs could be spotted and a bit of qol changes will likely be wanted. Currently the borgs :b chat can be overheard. Might also be the case for drones. I'm unsure if someone speaking another language can be overheard by those who can't understand via tts but is something to check. A way to mute tts for people who would want to play old school or however you would put it.

@Bluedino1025
Copy link
Contributor

One thing is mutes can speak with hands full but i like the idea of saying the gloves have tts

@cacogen
Copy link
Contributor

cacogen commented Jul 3, 2022

This is really cool, shame it can't be merged

@GeneriedJenelle
Copy link

There is also a rate limit, that is reached after a few rounds of nonstop tee tee ess. And half of that package would see no use, aka the "use language models" part. So if this were to see any unirony it needs some special discussions with NovelAI itself

Is there any chance there's a free open source tts engine with a similar quality that can be downloaded and run server-end to integrate into SS13? That's the main thing I'm thinking of that could possibly save this.

@TheSmallBlue
Copy link
Contributor

I didn't get to play with this on personally but I've seen clips of it and im OBSESSED dude I love this SO MUCH, I HAVE TO SEE THIS IN PERSON please please PLEASE for the love god make a way, ANY way for it to be in the game. Make it disabled by default so that admins can turn it on (they'd have to have a NovelAi subscription of course), or make it entirely client based, like each user who wants to use it has to pay a NovelAi subscription then type in their key and seed in the options menu and it works only for them, hell make it a Patreon goal and I'll fucking fund it 100% by myself I want this THAT bad. It makes the game 1000% better in every way

@Mothblocks
Copy link
Member

We don't get the patreon money :P

@TheSmallBlue
Copy link
Contributor

TheSmallBlue commented Jul 3, 2022

Meant it in a more "make MSO to pay for it with patreon money if the tier is met" way but I'm willing to pay a monthly tax to the head coder (you, mothblocks) if it means we get this thing of beauty
Hell I'd pay it myself and make the key or whatever public, I'd suck dick even! Just make this a thing PLEASE.

@MMMiracles
Copy link
Contributor

i'll give an extra dollar if it means i can talk about my minecraft lets play as keanu reeves

@tralezab
Copy link
Contributor Author

tralezab commented Jul 3, 2022

Is there any chance there's a free open source tts engine

absolutely

with a similar quality

no, sadly as far as i'm aware

@Iamgoofball
Copy link
Contributor

Iamgoofball commented Jul 3, 2022

Is there any chance there's a free open source tts engine with a similar quality

If we wanted to do this, we'd need to train our own TTS model from scratch with our own training data and our own server to run it on.

It's doable, but only if MSO agrees to host a server for it to sit on and take requests from the game servers.

That's also discounting that we'd need to spend money on GPU time to train said model.

@Mothblocks
Copy link
Member

Mothblocks commented Jul 3, 2022

@TheSmallBlue Yeah fair enough

I've wanted TTS for years but 99.99% this is going to need to be local. Services like this go offline pretty often from my experience (it having the possibility to go offline at all is a pretty sad loss), and the delay is noticeably huge, even ignoring the pricing (which is in itself subject to the whims of the service). Imagine if, for instance, one day the TTS service goes into a total maintenance mode, and is totally offline for weeks. Or if they change their pricing to be based on per-message rather than whatever it is now, or just raises their prices. We'd have to get rid of it, and people would be significantly more upset by that.

@TheSmallBlue
Copy link
Contributor

Ok so, to sum up, here are the possible ways of having this or something like it in a permanent way:


Option 1:
We agree to add the cost of the NovelAI service onto the server upkeep monthly patreon goal. API requests are done server side.

This would entail somehow convincing MSO to pay more for upkeep, and of course, a higher patreon goal.

  • Pros: TTS is free for every single user. Yay!
  • Cons: We'd be inmplementing an online service outside of our control into the game. It can go down any second, for any reason. The price for the service is also outside of our control, it could increase or decrease, making the server upkeep costs unstable. The delay between sending messages and them being read would be pretty big. And the problem that makes it clearly unviable: the rate limit. It wont be able to be used in constant highpop servers like Sybil, so its use would be limited to Manuel, and even then it'd only last a few rounds before we're rate limited.

A good example of how this would feel to use is the testmerges that happened on manuel already, it'd be like that. Maybe even as often.


Option 2:
We train our own text to speech model and host it on our own servers.

For this to happen, we'd need someone with a very good GPU to train a decent TTS model, then send that model over to MSO, who will need to set up a server with a GPU in it to run said model.

  • Pros: TTS is still free for every single user! There wouldn't be as much delay as option 1, and we'd have complete control over training and generation. We can choose our own voices, our own modifications to said voices, our own systems to modify said voices, hell maybe a way to implement it with BYOND that doesn't include APIs at all! No need to rely on someone else's service, no need to rely on someone else's pricing demands, no ratel imits!
  • Cons: Training an AI is very resource intensive. In terms of training we'd not only need someone with a very high end GPU to train the ai, but someone with the ai knowledge to choose the correct dataset and somehow recreate the level of customization NovelAI has, which is no easy task, and all of it for basically free (or maybe a code bounty?).
    We'd also need to convince MSO to buy a good GPU, which aren't cheap, to put it in a server, and to set up said server to work for on the fly ai synthesis. While AI synthesis isn't as intensive as training, its still no easy task and in the context of a server it has to be done SUPER quick, which means it needs high end parts. This all means more money, possibly over 1000+ bucks worth of parts, and that means higher upkeep costs, and I mean WAY higher. (I'm not an Ai expert or anything like that, I might be wrong on all this, if any of you want to correct me feel free to do so.)

Option 3:
We implement the API requests and integrate NovelAi, but locally instead of in the server side. Users would have to pay for their own NovelAi subscriptions

For the record I don't know how exactly our API system works, I've only heard it's not that good, so I don't know if we even have the option of sending an API call locally instead of from the server, but if we do...

  • Pros: Low delay, the less steps in the middle the better. The rate limit would possibly not be reached, or it wouldn't be reached as often. One player sees way less people talk than 30 players do. This would make the investment worth it for some (me, i am some). Unlike the other two options, the upkeep costs would stay the same.
  • Cons: Users would have to individually pay for their NovelAi subscriptions, and the servers could go down at any minute.

Option 1 is the most similar to what's currently implemented, except instead of MSO paying for a subcription its a mantainer. It clearly isn't stable, and it's not a good idea to keep using it.

Option 2 is the utopic option. If we manage to do it, it'd be awesome to have, but the actual road to getting there is way too complicated to it to be feasible any time soon.

Option 3 is the compromise, it's the one I see most likely to happen if anything does actually happen. Though again, I don't know enough about how BYOND works to know if it even makes sense.

Would be cool if a design document of sorts about this could be written, I think TTS improves the game to a whole new level and It should be implemented in some way.

Also do note that I am speaking mostly out of my ass so if anything I said is stupid and wrong please say so :)

@optimumtact
Copy link
Member

it's not going to happen

@TheSmallBlue
Copy link
Contributor

Oranges if you keep destroying my hopes and dreams I WILL break into tears

@optimumtact
Copy link
Member

good

@Iamgoofball
Copy link
Contributor

For this to happen, we'd need someone with a very good GPU to train a decent TTS model, then send that model over to MSO, who will need to set up a server with a GPU in it to run said model.

incorrect, we just pay amazon or google for a GPU on the cloud for training, then we run a model in cpu inference mode(after it's been benchmarked to actually handle our 300+ requests a minute on 3 servers)

@TheSmallBlue
Copy link
Contributor

incorrect, we just pay amazon or google for a GPU on the cloud for training, then we run a model in cpu inference mode(after it's been benchmarked to actually handle our 300+ requests a minute on 3 servers)

Huh, good point. The main issue then would be to find someone willing to pay for training. I also have no idea what cpu interference mode is nor why it'd need to be benchmarked, but it feels like this might actually be possible? It'd be so sick for us to have our own dynamic text to speech synthesizer

@MrStonedOne
Copy link
Member

I have a cluster of Single board computers in a kubernetes swarm in my garage. two of them are even atomic pis and pods can access the gpus.

I have like 5 computers with decent gpus too.

@SplinterGP
Copy link
Member

honestly, if we could somehow implement a more basic TTS that is doable would be also okay, goofy ass TTS's are funny asf like moonbase alpha one.

Also we have the option of waiting some 5 or 10 years so technology evolves and it becomes easier to do it and byond may be possibly less shit, or we are all dead in these 5 or 10 years

@AffectedArc07
Copy link
Member

I have a cluster of Single board computers in a kubernetes swarm in my garage. two of them are even atomic pis and pods can access the gpus.

I have like 5 computers with decent gpus too.

RS also lets you throw a quadro RTX 4000 into the server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Config Update Time to bother the headadmins for three months to get your config applied Do Not Merge You must have really upset someone Feature Exposes new bugs in interesting ways UI We make the game less playable, but with round edges
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet