Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: daviddarnes/alembic
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: scrapingbypass/scrapingbypass.github.io
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Loading
54 changes: 25 additions & 29 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -35,19 +35,17 @@ plugins:
- jemoji

# 3. Gem settings
paginate: 2 # jekyll-paginate > items per page
paginate: 10 # jekyll-paginate > items per page
paginate_path: blog/page:num # jekyll-paginate > blog page
jekyll-mentions: https://twitter.com # jekyll-mentions > service used when @replying
twitter:
username: DavidDarnes # jekyll-seo-tag > Owners twitter username
author: DavidDarnes # jekyll-seo-tag > default author
username: scrapingbypass # jekyll-seo-tag > Owners twitter username
author: ScrapingBypass # jekyll-seo-tag > default author
social: # jekyll-seo-tag > social overrides
name: David Darnes # jekyll-seo-tag > real name
name: ScrapingBypass # jekyll-seo-tag > real name
links: # jekyll-seo-tag > social aliases (sameAs)
- https://twitter.com/DavidDarnes
- https://www.facebook.com/daviddarnes
- https://www.linkedin.com/in/daviddarnes
- https://github.com/daviddarnes
- https://twitter.com/scrapingbypass
- https://github.com/scrapingbypass
# markdown: CommonMark # Markdown parse settings, CommonMark performs slightly better an others in build time
# commonmark:
# options: ["SMART", "FOOTNOTES"]
@@ -56,7 +54,7 @@ social: # jekyll-seo-tag > social overrides
# 4. Jekyll settings
sass:
style: compressed # Style compression
permalink: pretty # Permalink style (/YYYY/MM/DD/page-name/)
permalink: /:title/ # Permalink style (/page-name)
excerpt_separator: <!-- more --> # Marks end of excerpt in posts
timezone: Europe/London # Timezone for blog posts and alike

@@ -67,7 +65,7 @@ collections:
output: true
description: "My thoughts and ideas" # The post list page content
feature_text: |
Welcome to the blog
ScrapingBypass Blog!
feature_image: "https://picsum.photos/2560/600?image=866"

# 6. Jekyll collections settings
@@ -93,16 +91,16 @@ defaults:
# 7. Site settings
encoding: utf-8 # Make sure the encoding is right
lang: en-GB # Set the site language
title: "Alembic" # Site name or title, also used in jekyll-seo-tag
title: "ScrapingBypass" # Site name or title, also used in jekyll-seo-tag
logo: "/assets/logos/logo.svg" # Site logo, also used in jekyll-seo-tag
description: "Alembic is a starting point for Jekyll projects. Rather than starting from scratch, this boilerplate is designed to get the ball rolling immediately" # Site description and default description, also used in jekyll-seo-tag
url: "https://alembic.darn.es" # Site url, also used in jekyll-seo-tag
baseurl: ""
repo: "https://github.com/daviddarnes/alembic"
email: "me@daviddarnes.com"
description: "ScrapingBypass API helps users web scraping bypass Cloudflare 5 seconds delay, Captcha anti-robot verification!" # Site description and default description, also used in jekyll-seo-tag
url: "https://scrapingbypass.github.io" # Site url, also used in jekyll-seo-tag
baseurl: "/"
repo: "https://github.com/scrapingbypass/scrapingbypass.github.io"
email: ""
# disqus: "alembic-1" # Blog post comments, uncomment the option and set the site ID from your Disqus account
# date_format: "%-d %B %Y" # Blog post date formatting using placeholder formatting
# google_analytics: ""
google_analytics: "G-1RLNLXHV2L"
# google_analytics_anonymize_ip: ""
# service_worker: false # Will turn off the service worker if set to false
# short_name: "Al" # The web application short name, defaults to the site title
@@ -121,28 +119,26 @@ favicons: # Favicons are also used in the manifest file. Syntax is 'size: path'

# 9. Site navigation
navigation_header:
- title: Home
url: /
- title: Elements
url: /elements/
- title: ScrapingBypass
url: https://www.scrapingbypass.com?utm_source=scrapingbypassgithubio&utm_medium=referral&utm_content=navbrand
- title: Pricing
url: https://www.scrapingbypass.com/pricing.html?utm_source=scrapingbypassgithubio&utm_medium=referral&utm_content=navprice
- title: Blog
url: /blog/
- title: Categories
url: /categories/
- title: Search
url: /search/
- title: Fork Alembic
url: https://github.com/daviddarnes/alembic


navigation_footer:
- title: Created by David Darnes
url: https://darn.es
- title: Written by ScrapingBypass
url: https://www.scrapingbypass.com?utm_source=scrapingbypassgithubio&utm_medium=referral&utm_content=footer

social_links: # Appears in sidebar. Set the urls then uncomment and comment out as desired
Twitter: https://twitter.com/DavidDarnes
LinkedIn: https://www.linkedin.com/in/daviddarnes
GitHub: https://github.com/daviddarnes
link: https://darn.es
Twitter: https://twitter.com/scrapingbypass
GitHub: https://github.com/scrapingbypass
link: https://www.scrapingbypass.com
RSS: /feed.xml

sharing_links: # Appear at the bottom of single blog posts, add as desired. The value can be 'true' or the desired button colour
15 changes: 0 additions & 15 deletions _posts/2016-08-27-example-post-one.md

This file was deleted.

18 changes: 0 additions & 18 deletions _posts/2016-08-28-example-post-two.md

This file was deleted.

18 changes: 0 additions & 18 deletions _posts/2016-08-29-example-post-three.md

This file was deleted.

29 changes: 29 additions & 0 deletions _posts/2023-07-18-cloudflare.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: Cloudflare
categories:
- Cloudflare
excerpt: |
Cloudflare is a cloud-based security and web performance solution that provides a range of tools and services to help website owners improve their online presence. The platform offers a suite of services that includes content delivery, DDoS protection, SSL encryption, and more.
feature_text: |
## Cloudflare
Cloud-based security and web performance solution that provides a range of tools and services to help website owners improve their online presence.
feature_image: "https://picsum.photos/2560/600?image=872"
---

Cloudflare: An Introduction to the Cloud-Based Security and Web Performance Solution

Cloudflare is a cloud-based security and web performance solution that provides a range of tools and services to help website owners improve their online presence. The platform offers a suite of services that includes content delivery, DDoS protection, SSL encryption, and more.

Cloudflare was founded in 2009 and has since grown to become one of the largest providers of cloud-based web performance and security solutions. The company's mission is to make the Internet faster and more secure for everyone, regardless of their location or the size of their business.

One of the primary benefits of using Cloudflare is the content delivery network (CDN) that the platform provides. The CDN helps website owners to distribute their content across a network of servers located around the world, which means that visitors to the website can access the content quickly and easily, no matter where they are located. This can help to improve website speed and reduce latency, which can have a significant impact on user experience and search engine rankings.

Cloudflare also provides DDoS protection services, which can help to prevent websites from being overwhelmed by traffic from malicious sources. The platform uses a range of techniques to identify and block DDoS attacks, including rate limiting, bot detection, and IP blocking. This can help to ensure that websites remain online and accessible even during times of high traffic or attack.

Another key feature of Cloudflare is its SSL encryption services. The platform provides free SSL certificates to website owners, which can help to secure websites and protect user data. This can be particularly important for e-commerce websites and other sites that collect sensitive information from users.

In addition to these core services, Cloudflare also provides a range of other tools and services that can help website owners to improve their online presence. These include web application firewall (WAF) services, which can help to protect websites from common web-based attacks, as well as analytics and performance optimization tools that can help to identify and address issues with website performance and user experience.

Overall, Cloudflare is a powerful and comprehensive platform that can help website owners to improve their online presence and protect their websites from a range of threats. Whether you are a small business owner or a large enterprise, Cloudflare can provide the tools and services you need to succeed online.

<!-- more -->
38 changes: 38 additions & 0 deletions _posts/2023-07-20-captcha-solver-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Captcha Solver Extension - Pros and Cons
categories:
- Captcha
excerpt: |
Captcha is a security mechanism used on websites to prevent automated bots from spamming or hacking into systems. While it serves a vital purpose in online security, it can also be frustrating for users, especially when they have to solve multiple captchas. To make the process more manageable, some developers have created captcha solver plugins. In this article, we will discuss the pros and cons of captcha solver plugins.
feature_text: |
## Captcha Solver Extension
Captcha is a security mechanism used on websites to prevent automated bots from spamming or hacking into systems. While it serves a vital purpose in online security, it can also be frustrating for users, especially when they have to solve multiple captchas. To make the process more manageable, some developers have created captcha solver plugins. In this article, we will discuss the pros and cons of captcha solver plugins.
feature_image: "https://picsum.photos/2560/600?image=872"
---

Captcha is a security mechanism used on websites to prevent automated bots from spamming or hacking into systems. While it serves a vital purpose in online security, it can also be frustrating for users, especially when they have to solve multiple captchas. To make the process more manageable, some developers have created captcha solver plugins. In this article, we will discuss the pros and cons of captcha solver plugins.

## Pros
### Saves Time
The most significant advantage of using a captcha solver plugin is that it saves time. Instead of wasting time solving captchas, the plugin does the job for you. As a result, you can focus on other tasks that require your attention.

### Easy to Use
Captcha solver plugins are easy to install and use. You don't need technical expertise to use them. Once you install the plugin, it automatically solves the captchas for you.

### Increases Productivity
Since captcha solver extensions save time, they also increase productivity. You can complete tasks faster, which means you can take on more work or have more free time to do other things.

## Cons
### Security Risks
Using captcha solver plugins can pose security risks. Some plugins may collect your personal information, including your IP address, browser history, and login credentials. This information can be used for malicious purposes, such as identity theft or hacking.

### Violates Ethical Principles
Using captcha solver plugins goes against ethical principles. Captchas are designed to ensure that a user is a human and not a bot. Bypassing the captcha defeats the purpose of the security mechanism and promotes unethical behavior.

### Limited Effectiveness
Not all captcha solver plugins are effective. Some plugins may fail to solve complex captchas or may not work with certain websites. In such cases, you may still have to solve captchas manually.

## Conclusion
While captcha solver extensions may seem like a convenient solution, they come with their fair share of drawbacks. Using them can pose security risks, violate ethical principles, and may not always be effective. It's important to weigh the pros and cons carefully before deciding to use a captcha solver plugin. Ultimately, it's essential to follow ethical practices and respect online security measures to protect yourself and others.

In conclusion, captcha solver extensions offer a convenient way to solve captchas quickly, but they also come with drawbacks. It's important to consider the risks associated with using them and to decide if the benefits outweigh the drawbacks. If you do choose to use a captcha solver plugin, make sure to research the plugin thoroughly and use it responsibly.
83 changes: 83 additions & 0 deletions _posts/2023-07-21-python-selenium-bypass-cloudflare.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: How does Python Selenium bypass cloudflare to crawl web pages?
categories:
- Cloudflare
- Python
- Selenium
excerpt: |
How does Python Selenium bypass cloudflare to crawl web pages?
feature_text: |
## How does Python Selenium bypass cloudflare to crawl web pages?
feature_image: "https://picsum.photos/2560/600?image=872"
---



Like many websites, Cloudflare also detects access to see if it is initiated by a Selenium bot. This detection mainly focuses on whether there are unique js variables, such as variables containing "selenium" and "webdriver", or file variables containing "$cdc_" and "$wdc_".

The detection mechanism of each driver may be different, the following solutions are mainly for chromedriver.

1. Use Undetected-chromedriver This is a very convenient package that can be installed directly through pip. Then initialize the driver like below, after that it works just like regular Selenium usage.

```
python Copy code
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://nowsecure.nl')
```

2. Directly modify the chromedriver executable file You can change the key variable to any character that does not contain "cdc".

```
javascript Copy code
/**
* Returns the global object cache for the page.
* @param {Document=} opt_doc The document whose cache to retrieve. Defaults to
* the current document.
* @return {!Cache} The page's object cache.
*/
function getPageCache(opt_doc, opt_w3c) {
var doc = opt_doc || document;
var w3c = opt_w3c || false;
// |key| is a long random string, unlikely to conflict with anything else.
var key = '$cdc_asdjflasutopfhvcZLmcfl_';
if (w3c) {
if (!(key in doc))
doc[key] = new CacheWithUUID();
return doc[key];
} else {
if (!(key in doc))
doc[key] = new Cache();
return doc[key];
}
}
```

There is not much difference between these two methods in essence. In fact, undetected-chromedriver will apply a patch when starting chromedriver, completing the steps of modifying the key.

```
python Copy code
def patch_exe(self):
"""
Patches the ChromeDriver binary
:return: False on failure, binary name on success
"""
logger.info("patching driver executable %s" % self.executable_path)
linect = 0
replacement = self.gen_random_cdc() #这里修改了cdc的名称
with io.open(self.executable_path, "r+b") as fh:
for line in iter(lambda: fh.readline(), b""):
if b"cdc_" in line:
fh.seek(-len(line), 1)
newline = re.sub(b"cdc_.{22}", replacement, line)
fh.write(newline)
linect += 1
```

3. Using the **ScrapingBypass** API, you can easily [bypass Cloudflare](https://www.scrapingbypass.com) robot verification, even if you need to send 100,000 requests, you don't have to worry about being identified as a scraper.
A ScrapingBypass API can break through all anti-anti-bot robot inspections, easily bypass Cloudflare, CAPTCHA verification, WAF, CC protection, and provide HTTP API and Proxy, including interface address, request parameters, return processing; and set Referer, browser UA and headless status and other browser fingerprint device features.
Loading