# AbuseCH Data Scraper

## UrlHaus Blacklist

> URLhaus is a project from abuse.ch with the goal of sharing malicious URLs that are being used for malware distribution.

Reference: https://urlhaus.abuse.ch/

In [1]:
from pprint import pprint
from abusech.urlhaus import UrlHaus
urlhaus = UrlHaus()

### URLHaus Blocklist - Full Data

> The URLhaus database dump is a simple CSV feed that contains all malware URLs that are currently known to URLhaus.

Reference: https://urlhaus.abuse.ch/api/#retrieve

In [2]:
db_dump = urlhaus.get_data_dump()
pprint(db_dump[0:2])

[UrlHaus(id=257851, date_added=datetime.datetime(2019, 11, 25, 5, 53, 5), url='https://cdn.discordapp.com/attachments/644255276371017731/644257339766997001/discordprogram.exe', url_status='offline', threat='malware_download', tags='None', urlhaus_link='https://urlhaus.abuse.ch/url/257851/', reporter='JayTHL'),
 UrlHaus(id=257850, date_added=datetime.datetime(2019, 11, 25, 5, 53, 4), url='http://cdn.discordapp.com/attachments/576715262728863745/610135174239354893/b4bd25322c09eef0.exe', url_status='offline', threat='malware_download', tags='None', urlhaus_link='https://urlhaus.abuse.ch/url/257850/', reporter='JayTHL')]


### URLHaus Blocklist - Recent Data Only

> URLhaus database dump (CSV) containing recent additions (URLs) only (past 30 days):

Reference: https://urlhaus.abuse.ch/api/#retrieve

In [3]:
recent_dump = urlhaus.get_recent_urls()
pprint(recent_dump[0:2])

[UrlHaus(id=257851, date_added=datetime.datetime(2019, 11, 25, 5, 53, 5), url='https://cdn.discordapp.com/attachments/644255276371017731/644257339766997001/discordprogram.exe', url_status='offline', threat='malware_download', tags='None', urlhaus_link='https://urlhaus.abuse.ch/url/257851/', reporter='JayTHL'),
 UrlHaus(id=257850, date_added=datetime.datetime(2019, 11, 25, 5, 53, 4), url='http://cdn.discordapp.com/attachments/576715262728863745/610135174239354893/b4bd25322c09eef0.exe', url_status='offline', threat='malware_download', tags='None', urlhaus_link='https://urlhaus.abuse.ch/url/257850/', reporter='JayTHL')]


### URLHaus Blocklist - "Online" Data Only

> URLhaus database dump (CSV) containing only online (active) malware URLs.

Reference: https://urlhaus.abuse.ch/api/#retrieve

In [4]:
online_dump = urlhaus.get_online_urls()
pprint(online_dump[0:2])

[UrlHaus(id=257796, date_added=datetime.datetime(2019, 11, 24, 18, 21, 15), url='http://192.210.180.163/razor/r4z0r.mips', url_status='online', threat='malware_download', tags='elf', urlhaus_link='https://urlhaus.abuse.ch/url/257796/', reporter='zbetcheckin'),
 UrlHaus(id=257753, date_added=datetime.datetime(2019, 11, 24, 8, 26, 16), url='http://uloab.com/putty.exe', url_status='online', threat='malware_download', tags='exe', urlhaus_link='https://urlhaus.abuse.ch/url/257753/', reporter='abuse_ch')]


### URLHaus Blocklist - Payloads

> URLhaus regularely checks the content served by malicious URLs that are known to URLhaus. This CSV contains all payloads collected by URLhaus, identified by a hash (MD5 / SHA256 hash). Please consider that not all payloads are malicious. As a matter of fact, a URL can e.g. serve any content once it has been cleaned up.

Reference: https://urlhaus.abuse.ch/api/#clamav

In [5]:
payloads = urlhaus.get_payloads()
pprint(payloads[0:5])

[Payload(timestamp=datetime.datetime(2019, 11, 25, 6, 9, 58), url='http://druzim.freewww.biz/DEDKE.exe', type='exe', md5='df072a08eff7f92a600369ae3889b856', sha256='ec8875337f89bdfea1bc5768b8c9bc68547710f59f016846d8a33b0ac60bd35f', signature=None),
 Payload(timestamp=datetime.datetime(2019, 11, 25, 6, 8, 56), url='http://www.chalesmontanha.com/newsletter/En/Client/Customer-Invoice-EY-0944105/', type='doc', md5='3aa2c722f03f45e6176e245662737109', sha256='055ee1bf3f0aa40ce77b66a4e65f5a247a09747b5fca2bee708e65a506afee7b', signature=None),
 Payload(timestamp=datetime.datetime(2019, 11, 25, 6, 8, 18), url='http://graphee.cafe24.com/dh/downfile/DooMHelper.exe', type='exe', md5='2c188069eaf6c9a2d2972eedf3e65020', sha256='0edff846240b6c4a7c6aeb77010713e547290c0df6d66af0bc3e71ec74495b67', signature=None),
 Payload(timestamp=datetime.datetime(2019, 11, 25, 6, 8, 8), url='http://d1.paopaoche.net/x1/djfs.exe', type='exe', md5='450f1abd18d2e8a8972aee9be0efdc0c', sha256='3719181f075a739652486b2f4c27