Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass Google's 5000 characters limit. #12

Closed
mysticaltech opened this issue Jan 29, 2020 · 37 comments
Closed

Bypass Google's 5000 characters limit. #12

mysticaltech opened this issue Jan 29, 2020 · 37 comments

Comments

@mysticaltech
Copy link

Hello, let's say I want to read this article https://blog.cloudflare.com/empowering-your-privacy/, I can't in one shot, I have to select multiple times, it's a hassle! Why the 5000 chars limit? If this could be removed or manually configured, it would be awesome!

@pgmichael
Copy link
Owner

Hi @mysticaltech ,
The 5000 characters limit is imposed by the Cloud Text-to-Speech's API. You can find information about the usage limit in their official documentation.

To avoid this, we would need to split the selection into multiple queries. However, I've yet to find a way to seamlessly stitch the audio together.

Cheers!

@mysticaltech
Copy link
Author

@pgmichael Thanks for the explanation. Couldn't we just put the audio clips into some sort of a playlist? No need to stitch them.

@pgmichael
Copy link
Owner

pgmichael commented Feb 20, 2020

Seems doable, I’ll see what I can do!

@pgmichael pgmichael changed the title 5000 characters limits sucks as most articles are more Bypass the Google's 5000 characters limit. Mar 1, 2020
@pgmichael pgmichael changed the title Bypass the Google's 5000 characters limit. Bypass Google's 5000 characters limit. Mar 1, 2020
@superluig164
Copy link

A suggestion: Don't split the queries in the middle of sentences, that way there's no need to seamlessly stitch the audio. The pause between sentences will be enough to mask it.

@simsim314
Copy link

simsim314 commented Feb 2, 2021

I'm ready to pay some nominal amount (10$) to the one who will remove the 5K limitation, it's extremely annoying and can be done with superluig164 suggestion or in any other way - as I'm concerned just split on the last word that < 5K chars, only don't force me to click once again and again and again on the same buttons when I want to listen. Also I'm willing to pay 5$ to the one who will make two menus on the right click instead of the extra bar menu. I can open a contract in upwork if you want to.

@superluig164
Copy link

If I wasn't in school, I'd take the time to PR it myself. I agree, it's super annoying. This extension could certainly use so many improvements, but for me the compromises are all worth it for WaveNet.

@simsim314
Copy link

simsim314 commented Feb 2, 2021

I personally don't need any improvement except the 5K limitation and the annoying right click -> choose menu -> click like 10 times, to listed for a single chapter of anything. I know the google stuff made this from a complete success downloaded 100 million times to whatever it's now. If I went to the google cloud platform, register there, create an API Key, and all the troubles for wavenet in my browser, I hope to not click million times to listen to it. I think it's simple enough, just wanted to add motivation to the developers if this is like 10 minutes job. If not then please tell me how much job is it?

@superluig164
Copy link

Actually, upon hunting around, it looks like there is an active PR to implement the feature. Unfortunately, it seems @pgmichael has not been active for a while, at least since March, and so nobody is going to be able to merge it, despite it having no conflicts with the main branch.

@mysticaltech
Copy link
Author

Yes, it's a shame as that a good project! Tried quickly to find @pgmichael email but to no avail. Maybe we can use @kevininspace's version with the PR.

@simsim314
Copy link

simsim314 commented Feb 2, 2021

Michael's email is on the page of his github account. I've reached him but he's not available for this task. He also suggested to use Microsoft Edge read aloud tool it has several voices as well - but this solution is very very suboptimal in the sense I don't want to switch browser for this feature and do want to use it in chrome. Yet for anyone who want to test wavenet other voices - seems there is no other alternative currently.

Let me see if I can do something about it. If not - I can still post a job with this request and open another branch.

@mysticaltech
Copy link
Author

mysticaltech commented Feb 2, 2021

Ok, thanks @simsim314, what if we create a wavenet-for-chrome GitHub team and transfer the project to it, would you accept @pgmichael? That way anyone can come in, submit PRs and a bunch of us can approve them. This way this becomes a full-blown open-source project!

Of course, we would also control the Google Chrome extension release, so if it's in your personal name it would be good to remove it for now so that we can push it back later on from a team's account.

@simsim314
Copy link

@mysticaltech There is a license agreement for this things, and this is MIT. That means that we should acknowledge @pgmichael as the main contributor to this project, but we can modify and continue develop it as we see fit. We can also post another chrome extension to chrome store, with our modifications. You can fork and continue develop the project without any additional permissions or anything.

We don't need to replace the name or ask permissions for the extension as well - we can use the code as we wish as long as we attribute parts pgmichael wrote to him (check MIT license agreement - you can even sell the extension to clients). We can download and open a new main branch as well - and commit to chrome store as wavenet2 for chrome or wavenet for chrome community addition or whatever. It's all in accordance to MIT license.

@mysticaltech
Copy link
Author

Oh yes, of course, I know the MIT license very well, but it would be cleaner to just transfer the project to a team that he would be part of and probably remain the main contributor if he wants to.

@simsim314
Copy link

@mysticaltech - I spoke to Michael in email correspondence and proposed to hire him to work on my requests with payment. He said "Unfortunately, I cannot take any additional workload at this time". I think it's a legitimate answer (happened to me several times with my open source projects).

If you want I can forward you my correspondence with pgmichael to a private email - with his permission. But as general - we don't need to wait for his answer here, as I mentioned it's MIT license, he has the full right to abandon his project in any stage and never return to it if he wish so. I would also like him to continue working on it as well - this is why I first sent him the proposal.

Maybe your suggestion is different than mine - as you want to build a team, this project can be even profitable if you train your own wavenet and rent tts services for price just like google did, but through chrome directly and your own server. Several languages and voices, as well as different speeds can be installed in the addon. But I went too far - I just want to remove 5K limitation first, I probably can do it myself or with assistant for very low price. Anyway if you're into hiring a team to develop this addon - count me in to train new wavenets voices. Training nets is my profession :)

@mysticaltech
Copy link
Author

Ah good to know @simsim314, you're a data scientist or ml engineer I can deduce. That is cool... :) The idea of creating a team was just to try to make the most out of this project, as I find that limiting such a cool project to one owner is kind of a shame, thankfully it's very generous of him to have chosen an MIT license. But me too I have no time for this... Maybe we could run with @kevininspace's branch, as he's the one with the PR that proposes to remove that limitation? If you manage to do it, please let me know.

@pgmichael
Copy link
Owner

Hi guys,

Awesome to see this much interest in this project!

I like @mysticaltech idea, and I'd be willing to do that. I could set this up on Wednesday for you guys to get started.

It would also be possible to have a Github action that updates the extension in the chrome store. Right now there's a lot of people forking this repo and adding a single feature but it doesn't benefits the ~7.5k users that currently have the extension installed. This would be a nice workaround that.

Let me know how you'd like to proceed @simsim314. You're totally free to go ahead and do your own thing also if you'd like. As intended by the MIT license :)

Cheers!

@mysticaltech
Copy link
Author

@pgmichael that is awesome! Sounds like a nice plan... I say let's do this!!

@simsim314
Copy link

simsim314 commented Feb 2, 2021

@pgmichael I thought you are just not interested in this project - or too busy for it. I completely support whatever you do to improve it, and willing to contribute to the effort some nominal amount - hopefully more people can join my contribution, maybe on patreon or something. I just want to read my texts in normal human like voice natively in chrome, without any major limitations or restriction. While I understand that computational power (and development time) is not a free ride - and willing to pay reasonable amount for this capability, to the developers as well to google or anyone who provides the service. Many thanks! I didn't have intention to find workarounds based on MIT license - just thought it's a very low priority for you now, this happens all the time for hobby / free time / just for fun projects.

@mysticaltech
Copy link
Author

mysticaltech commented Feb 2, 2021

Hello @simsim314, money is always good, up to @pgmichael, but I guess now the next step for him as project lead would be to create a team with more admins able to merge PRs into the project. I would love to assist him.

So don't hesitate to add me to the team @pgmichael, and thank you!

@simsim314
Copy link

I can devote some time to assist with simple merges from PRs. I know basic js and probably would be able to compile the project and fix some small issues in merges etc. But js is not my primary nor secondary language and I have around 100h of experience with it, maybe less. @mysticaltech @pgmichael

@simsim314
Copy link

simsim314 commented Feb 2, 2021

I've managed to download, compile, and install in my chrome @kevininspace PR. It does allows downloading more than 5K chars as separated mp3 files, yet it doesn't read more than 5K chars. I find this a bit problematic - as sometimes I want to stop reading and I don't want to open several separated mp3 files. So this PR is definitely in the right direction - but some more work is needed. I expect some kind of event triggering a new request for the speech, for example assuming we have chunks of 5K - 1, 2, 3, 4, 5 etc. start from request 1+2, then when 1 finished reading automatically read 2 and request 3. Thus not so many extra characters would be used even if I've chosen million characters to read (and a button to stop reading is necessary as well). Otherwise with @kevinspace feature there is no way to stop creating more and more mp3 files - other than closing chrome, and most people I guess using read aloud directly.

@mysticaltech
Copy link
Author

Thanks for trying @simsim314! If it was up to move, I would extract the logic of mp3 merging and play stop, from https://github.com/ken107/read-aloud, it's actually what I use with Amazon Polly, and it works great!

@simsim314
Copy link

@mysticaltech You can actually use the same addon with google API key and you will get all the wavenet voices as well. The only advantage of wavenet-for-chrome except the catchy name is the ability to download mp3 files. I also like how minimalistic it's I think if the 5K limit was solved the addon would be pretty useful.

@mysticaltech
Copy link
Author

mysticaltech commented Feb 3, 2021

Yes exactly @simsim314, wavenet-for-chrome is minimalistic and google wavenet focused. That does not mean that it cannot "rip" the good functionalities away from read-aloud, in fact, I think it should.
Screen Shot 2021-02-03 at 10 20 24 AM

@pgmichael
Copy link
Owner

I followed @simsim314 text chunking idea and got something to work.

See PR #38

If someone could checkout the branch and test it locally it would be nice! If there isn't any issues I'll merge and publish it in the coming days :)

@superluig164
Copy link

Oh my f*cking GOD YES!

@simsim314
Copy link

I will be able to test it tomorrow. @pgmichael Many thanks!

@simsim314
Copy link

simsim314 commented Feb 4, 2021

I've added my review. A minor issue: when reading text, in order to stop it, I need to select some text to see wavenet-for-chrome and only after selecting text I can stop the reading.

I can send you 10$ by paypal, or donate them to a cause of your choice (you can contact me by email).

Many thanks!
@pgmichael

@superluig164
Copy link

Note, since Chrome added the media controls button, you can actually use that to stop reading. It's not the most elegant solution, but it's better than selecting other text to make it stop.

@kevininspace
Copy link

Hey everyone. Just getting to these now. Great to see the interest.
FYI, my workaround was to split at full sentences with the total being less than 5K; I have a separate script which uses ffmpeg to merge the files, rename and delete the "download000.mp3" ones, so it's fairly hands-off.
I was just about to look into adding the merging of the mp3s into the code, and now see that this has taken a different direction. Great!

@pgmichael
Copy link
Owner

I'm closing this issue as the new version has been submitted to the chrome store for review.

It should be available in the coming days (depending on how long the review process takes by Google).

@mysticaltech
Copy link
Author

Awesome work @pgmichael and @simsim314! Something that is very very important to me personally is playback speed. Thankfully read-aloud has all the logic (MIT), probably it happens at the generation level, not the playback?! Not sure.

@simsim314
Copy link

simsim314 commented Feb 5, 2021

You have playback speed. Check out the screen where you add the api key, it has several voices and speed as well. @mysticaltech

@mysticaltech
Copy link
Author

Oh, missed that, awesome!

@mysticaltech
Copy link
Author

mysticaltech commented Feb 5, 2021

Now, if any of you has android, when you visit a page and order google assistant to "read it", it does so while highlighting the text in the webpage itself. I know this is next level, but would love to see it one day implemented here! Keep up the good work folks!

@simsim314
Copy link

@mysticaltech For that I think there is a seperate issue already.

@kevininspace I think Michael fixed the mp3 as well, I've tried to download large texts and they come up as mp3 pretty well now.

@mysticaltech
Copy link
Author

@simsim314 Indeed, done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants