Skip to content
Spider for url on website
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
Spider-1.0.6.zip
Spider.zip

README.md

1.The need for python version:

Python-3.5.2 or upper

2.The need for support library:

import re
import requests
import sys
import queue
import threading
import getopt
import sqlite3
import random
import urllib.parse
import time

3.The need for database:

Sqlite3-3.11.0 or upper

4.Other requirements:

Need to build folder named by "url"
Need to build text file named by "url.txt" to store sites with not any empty row

5.Example:

Spider for url  -version-1.0.3 --Written by WJY
	
	-h   Print the information of help
	-f   specified site file was crawling
	-k   specified keyword in url.Multiple key separated by ',' 
	-n  specified keyword not in url.Multiple key separated by ','
	-t   specified the count of threads
	-l   specified by crawling site depth
	-r   specified the frequency of crawling
	-m 	 specified the mode of store 'text' or 'db'
	
	Exampe:python3 Spider.py -f url.txt

6.Notice for default value:

hreads		=>	"1"
keyword		=>	""
site file	=>	"./url.txt"
level		=>	"2"
rate		=>	"100"
You can’t perform that action at this time.