robotsparse

A python package that enhances speed and simplicity of parsing robots files.

Usage

Basic usage, such as getting robots contents:

import robotsparse

#NOTE: The `find_url` parameter will redirect the url to the default robots location.
robots = robotsparse.getRobots("https://github.com/", find_url=True)
print(list(robots)) # output: ['user-agents']

The user-agents key will contain each user-agent found in the robots file contents along with information associated with them.

Alternatively, we can assign the robots contents as an object, which allows faster accessability:

import robotsparse

# This function returns a class.
robots = robotsparse.getRobotsObject("https://duckduckgo.com/", find_url=True)
assert isinstance(robots, object)
print(robots.allow) # Prints allowed locations
print(robots.disallow) # Prints disallowed locations
print(robots.crawl_delay) # Prints found crawl-delays
print(robots.robots) # This output is equivalent to the above example

Additional Features

When parsing robots files, it sometimes may be useful to parse sitemap files:

import robotsparse
sitemap = robotsparse.getSitemap("https://pypi.org/", find_url=True)

The above code contains a variable named sitemap which contains information that looks like this:

[{"url": "", "lastModified": ""}]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
robotsparse		robotsparse
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

robotsparse

robotsparse

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

robotsparse

Usage

Additional Features

About

Releases

Packages

Languages

License

xyzpw/robotsparse

Folders and files

Latest commit

History

Repository files navigation

robotsparse

Usage

Additional Features

About

Resources

License

Stars

Watchers

Forks

Languages