Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the parameter settings of Captcha Synthesizer #7

Closed
ziqiangchen opened this issue Jan 2, 2019 · 22 comments
Closed

About the parameter settings of Captcha Synthesizer #7

ziqiangchen opened this issue Jan 2, 2019 · 22 comments

Comments

@ziqiangchen
Copy link

Hello! bro, I have seen your paper, I have the question about the paramter settings of the Captcha Synthesizer, I don't have seen the parameter settings implement in the code ,,So, I think you have paste the captcha in the white image, and the roate angle, or the color change parameter is trained by the generator network ?is it correct ? or you have implement the parameter settings by other method? thx....

@SongyiGao
Copy link

I have the same confusion. I wish know how to use the parameter control the data generator,for example,if I wish get the image with noise background. How does the discriminator generate images with noise? Is it because the real data have the lables(Noisy background,Occluding lines,Distortion)?

@yeguixin
Copy link
Owner

yeguixin commented Jan 3, 2019

I used an image generator which can paste each character in the white image. Here every character is also a small image. All parameters such as rotate angle, occluding lines and etc are trained at this step.
We also have another gengrator which is part of GAN and it aims to modify the generated captcha at pixel level to make sure that the generated captchas are similar to the real ones.

@yeguixin yeguixin closed this as completed Jan 3, 2019
@yeguixin yeguixin reopened this Jan 3, 2019
@yeguixin
Copy link
Owner

yeguixin commented Jan 3, 2019

Sorry, I reopened this issue.

@awsssix
Copy link

awsssix commented Jan 4, 2019

@yeguixin You use only 500 real captchas,and Captcha Image Generator would generator 500 captchas or more?

@scut-salmon
Copy link

scut-salmon commented Jan 7, 2019

If each character is a small image, then how do you consider the distance between them, do you think the distance of all characters in real image are equal?
And the most confusing thing is that you train all the parameters, do you mean train by neural net work? Or you just set this parameters manually?

Looking forward to your kindly reply, thanks!

@yeguixin
Copy link
Owner

yeguixin commented Jan 10, 2019

@awsssix We can synthesize as many captchas as we want as we generate using an traditional captcha generator. The 500 real captchas and the synthetic captchas are used to train the GANs. Here is a traditional captcha generator which maybe contribute to understand how to generate captchas.

@scut-salmon The distance between two characters is random within a certain range such as [0, 20]pixels. Our sythesizer can automatically tune the boundariy of the range. Note that some parameters such as background, number of occluding lines are fixed according to the captcha scheme. In our initial experiments we found that manually set the synthetic parameters can also performs well.

@awsssix
Copy link

awsssix commented Jan 11, 2019

@yeguixin Thank you very much!
In the article, you use grid search method to search for the optimal parameters.
I still confuse with MADS.
Q1:Befor using traditional captcha generator, are these parameters initialized first?
Q2:Can you give us a more detailed introduction, or some links?
Q3:How does MADS determine whether these parameters are optimal?(if compare with real captchas, Compared to what)
Best wishes!

@yeguixin
Copy link
Owner

yeguixin commented Jan 12, 2019

The parameters are initialized at first. After that, the captcha generator synthesizes captchas and then the generator of GANs tries to tune the synthetic captchas. At last, the discriminator will distinguish the generated captchas from the real ones.
If the discriminator successful identify the synthetic captchas (Here is a threshold to determine the discriminating ability), the value of parameters will be tuned. Here the value increases iteratively because we set the initial value relatively small.
In order to determine the optimal parameters, there are many training tricks such as tuning parameters per 10 iternations. If the discriminator successfully identify the synthetic captchas at most times, the parameters will be tuned.
In fact, the discriminator determines fake or true by comparing the patch images between the real captchas and sythetic captchas. Here you can refer to the PachGANs

Hope that the above will be helpful.

@scut-salmon
Copy link

excuse me, could you please offer some sample of real captcha and the correspoding synthetic captcha generated by the captcha generator(without security feature)

@yeguixin
Copy link
Owner

@scut-salmon I plan to publish a runable version.

@Ru7z
Copy link

Ru7z commented Jan 17, 2019

This repo should be very helpful to you. @scut-salmon @awsssix

@jiyonghe
Copy link

here is used 1500+ real captchas,the image's background has noise, after the model was trained,when i predict the new captcha,the Recognization Accytacy is zero

@dhitaj
Copy link

dhitaj commented Mar 5, 2019

@yeguixin when is the runable version scheduled for release?

@Times125
Copy link

Times125 commented Jun 4, 2019

when is the runable version scheduled for release? @yeguixin

@S0mbre S0mbre mentioned this issue Jun 11, 2019
@thograce
Copy link

In the first step, the captchas generator generates the color captchas through the input characters. Can I think that if all characters have the same color, it is a captcha without safety features, and if the characters have different colors, it is a safety feature? @yeguixin

@yeguixin
Copy link
Owner

In general, a single color can eaisly be removed by using some image preprocessing methods. Different character colors increases the difficulty of the preprocessing. In our paper, we just summaried and categoried six kinds of security features for better description.

@thograce
Copy link

In general, a single color can eaisly be removed by using some image preprocessing methods. Different character colors increases the difficulty of the preprocessing. In our paper, we just summaried and categoried six kinds of security features for better description.

So if I want to generate the synthetic captchas with different colors by GANs, I can input some captchas with a single color and set the different colors(included in the fifth security feature) as a parameter of training. The characters of my real captchas are all different colors. Am I right?

@yeguixin
Copy link
Owner

When generating captchas, you can random set different colors for each characters by changing the RGB value.

@thograce
Copy link

When generating captchas, you can random set different colors for each characters by changing the RGB value.

Do you mean to set color parameters in traditional generators or in GANs? According to your previous reply, multiple colors should be considered as a security feature, while traditional generators should generate relatively clean captchas(single color) to the generator in GANs. Thank you for your reply.

@yeguixin
Copy link
Owner

Yes, the parameters of security features are setted in the traditional captcha image generator. Once trained, the traditional captcha image generator can generate captchas with and without security features for a targeted captcha scheme.

@thograce
Copy link

thograce commented Nov 20, 2019

Thank you very much for your reply and it has helped me a lot. But I may have to bother you. How many data pairs did you use to train the pix2pix model in the preprocessing step? Did you combine two pictures into one?

@yeguixin
Copy link
Owner

We use 20K pairs of synthesized captcha images to train the preprocessing model. The data format strictly follows Pix2Pix model. To do so, we first resize the captcha image to 256*256pixels and then combine two images into one as the following example.

0_fake_B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants